This article is part of a series on SOA Development and Delivery.
Let’s get started on our SOA Development and Delivery journey by talking about version control. But before we delve right in, let’s take a moment to reflect on this axiom:
Axiom 1: Developing a SOA Composite Application is software development.
When we sit down to create a SOA Application, i.e. composites, user interfaces, services, etc., we are actually embarking on a software development exercise. I think that some people don’t believe (or at least they don’t admit to themselves) that this is the case. Why? Well acknowledging that it is, in fact, software development implies a whole bunch of practices are necessary – like version control and testing for example. And those are hard, right?
Well, they are certainly more work. But its a bit like insurance – it is a cost you accept in the present to offset or prevent a much more significant potential cost/pain in the future. In their quintessential book Continuous Delivery, Dave Farley and Jez Humble say:
“In software, when something is painful, the way to reduce the pain is to do it more frequently, not less.”
Putting in the extra effort upfront will save you from a lot more pain and effort later on. Using version control fits into this category. It can be a bit painful, but it is definitely worth it in the end. Consider this axiom:
Axiom 2: Developing software without version control is like mixing chemicals without reading the labels – sooner or later, it is going to blow up in your face.
Why is this true? Consider the following questions – how would you answer these if you are not using version control?
- What was the content of (some BPEL file) on (some day in the past)?
- Who changed (some endpoint URL) in (some BPEL file)? When? Why?
- We found a problem in production, which version of the source matches this SAR we have in production?
- Can I get a copy of that source, with no other changes or updates to any other files?
- How come I can’t build this? It must have worked before!
- Where is the deployment script for this composite?
- How did we set up the test environment that we used to test this composite?
- Which version of the SOA configuration plans matches this version of the SAR?
- Which version of the OSB project and the ADF UI matches this version of the composite?
The answer is, of course, that you can’t answer them. At least not with any level of confidence. I would go so far as to say that if you don’t have your SOA applications under version control, you should stop whatever you are doing, and go set it up right now! Sooner or later you are going to encounter a critical problem in your production environment that you simply cannot fix.
That leads us to one more axiom:
Axiom 3: If it is not in version control, it does not exist.
If you cannot retrieve a previous revision of an artifact, then it may as well not have ever existed at all – it is of approximately the same value to you either way.
Now, what do I mean by ‘using’ version control? Well, it is more than just checking in your changes. That is an excellent start, and it is much better than nothing at all, but really, you should be thinking about a few more advanced ways of using your version control system as well – like branching and tagging for example. We will come back and discuss these in some depth later in this article.
For now, let’s cover some basics.
Which version control system to use?
There are a number excellent version control systems available, both free and commercial. I have used most of them – rcs, SCCS, CVS, Subversion, ClearCase, Perforce, Mercurial, and git to name a few, and even some proprietary ones that exist only inside the organizations that use them or are included in a particular operating system (e.g. OpenVMS) or file system (e.g. zfs).
Today I mostly use Subversion, but I am going through a transition to git. Let’s talk about these two in a little more detail.
Firstly, both of them are free, and both are widely used. That means that there is good tool support, a large community of people creating helpful content about how to use them, and a pretty good chance that anyone you get working on your project will have some level of familiarity with them. They both also have excellent, freely available books to help you get started: Version Control with Subversion and Pro Git.
The other thing that is very useful about these two, specifically in the context of Oracle SOA Suite development, is that they use an atomic commit which means that you can change a number of files and commit all of those changes as a single revision. CVS, for example, does not allow you to do this because it versions each file individually. This capability is great in a SOA environment, as many of the changes we need to make involve making changes to more than one file. Having the project in a state where some, but not all, of those file changes are committed, is not useful.
JDeveloper supports Subversion quite well, although you do need to invest a little time to make sure you know how to drive it. And support for git was added in JDeveloper 126.96.36.199.
There are three key things that are driving my personal migration from Subversion to git:
- The increasing demand for distributed development, especially when multiple organizations are involved in the development lifecycle – git is one of the fleet of new distributed version control systems,
- The ability to commit, branch, merge, etc., while offline (and away from the watchful eye of the continuous integration server), and
- The availability of excellent tools (like GitLab) that provide easy visibility into the repository itself, and into the branching and merging over time.
I personally find the git workflow more complex than the Subversion one, but for me at least, the time has come to make the move. I think that moving to distributed version control is pretty much inevitable in the modern world.
I think the most important thing to say about which version control system is the right one to use is this – any is better than none. If you want me to recommend one, I would have to say git.
What should we put in version control?
Version control is for source artifacts, not derived artifacts like binaries, deployable packages, etc. Source artifacts does not just mean source code – it means anything that is needed to recreate your production environment from scratch. Craig Barr proposed an excellent list in this article which I am quoting here:
- “OSB Configuration
- SOA Projects
- Customization Files
- Composite Configuration Plans
- WebLogic Deployment Plans
- Build Scripts
- Test Scripts
- Deployment Scripts
- Release Scripts
- Start-up & Shutdown Scripts
- “Health Check” Scripts
- Application Server Configuration
- Puppet Configuration
- (Optionally) The Binaries
Note: This is unnecessary and redundant if you follow good binary management which I’ll discuss in the next blog installment.
- And so on….”
Personally, I do not agree with putting your binaries into your version control system. I think that binaries belong in a separate repository, because they have quite different characteristics and management needs. We’ll talk a lot more about this in a future post on binary management, but for now a couple of examples to illustrate the point:
|Tend to be relatively small files||Tend to be relatively large files|
|Tend to change frequently||Tend to never change after they are created|
|Usually we want to keep all revisions||Often we only want to keep important and recent revisions|
|Are created by a person||Are (or at least should be) created by some automated/programmatic process|
|Cannot easily be recreated it they are lost or damaged||Can be easily recreated if they are lost or damaged (assuming you still have the source, etc.)|
So what do you put into version control for SOA, OSB, ADF?
The simplest answer to this question is ‘whatever is left in the project after you have executed the clean action on the project in the IDE.’ Note that you would need to make sure you have disabled automatic builds in eclipse, otherwise it will just go ahead and build again.
We need to be careful about a couple of things here:
- First, what is a project? For SOA, we really need to be checking in at what JDeveloper calls the SOA Application level, not at the level of what JDeveloper calls a SOA Project. The reason for this is quite straightforward – there are a number of circumstances under which it is not possible to build a SOA Project without having access to some of the information (files) in the SOA Application. For example, the presence of Human Tasks and Business Rules in a composite (SOA Project) are such an occasion. In both of these cases, you need to be able to access the adf-config.xml file in the SOA Application to get the necessary MDS configuration information to build the project.
- There are some directories that your version control client may automatically hide, because their names start with a period (‘.’) – for example, there is a ‘.adf‘ directory. These often contain important data and you need to make sure that you check them in.
- Depending on what you have done in your project, the out of the box ‘clean’ action might not do a proper clean up of your project. If you have created extra target directories, for example, you might need to make sure that those removed too. A good example of this is when you use Maven to build a SOA project – it will create a target directory, in addition to the normal deploy directory. You need to make sure that the mvn clean also removes the deploy directory and/or that the IDE also removes the target directory.
The same goes for ADF and OSB projects. Given the flexibility provided by the tools, there is really no ‘one size fits all’ answer to this question, you need to invest the time to work out the correct answer for your own projects.
At the end of the day, there are two ways you can get this wrong:
- You leave out some files that are needed. This is a problem that you will need to go back and fix.
- You include some extra files that are not needed. This is probably not a big deal. It could possibly result in some extra files being in your binaries. That may or may not be a problem.
So if you are limited for time, better to opt for too much than too little.
How should we set up the repository?
A question that comes up fairly often, particularly with Subversion, is how to structure the repository.
There are two common approaches here – one is to have a single Subversion repository with all the projects in the same repository. The other is to have one repository per project (or development group, or whatever unit).
There are of course advantages and disadvantages to each approach. Commonly cited issues include: the amount of administration overhead, the time taken to perform backups (which is done by dumping the whole repository), issues with revision numbers (which are shared across the whole repository) and comments (knowing which ones are for a specific project), different security requirements, or code separation requirements for different projects, and different Subversion workflow requirements across projects.
In practice though, I think that these can be handled with approximately the same amount of effort regardless of the approach chosen, assuming a suitably experienced Subversion administrator.
I believe that the one repository approach is easier, and I would recommend taking that approach unless there is some specific reason not to. The most likely reason is that some project team does not want their code stored with another team’s code due to some kind of confidentiality or licensing issue (perceived or real).
So, If you are using Subversion, I would recommend a single Subversion repository, shared by all projects for a SOA environment.
Inside the repository, you should create zero or more levels of directories that you use to organise projects into logical groups, then under these, create a directory for each project (i.e. SOA Application, etc.) and under that create the recommended Subversion trunk, tags, and branches directories. This is also consistent with the approach recommended in Version Control with Subversion. So your repository might look like this:
root - businessUnit1 - project1 - composites - GetCustomerDetails - trunk - tags - branches - ProcessOrder - trunk - tags - branches - ui-projects - ... - ... - businessUnit2 - ...
With git, I am in the habit of creating a git repository for each project, as that is a more natural way to organise things in git.
Tagging, in the context of a version control system, is essentially making a named copy at a given point in time. (Though more than likely you will just be copying pointers, not all of the content.)
Why would you want to be able to go back to a given point in time? There are two excellent reasons:
- You have found a problem in an older version that is deployed in a production (or other) environment that you need to fix, so you need to get back the exact version of all of the source artifacts that were used to create that particular version, and
- Things have gone very bad and you need to go back to a known good point in the past.
So, the first one tells us that you must tag whenever you are going to release. You might also want to tag whenever you reach a significant or meaningful milestone.
The second reason can be addressed with tagging, but you might be better to use a branch in that case. We will talk about branches in a moment.
What is a version anyway?
Often people ask about the relationship between the ‘versions’ in the version control system and the build system, and the runtime versions. It is important to understand that there is not, and need not really be, any kind of direct relationship between them, other than for releases or release candidates.
Normally you are going to be building the latest version of the code – this is sometimes called the ‘head’, or ‘trunk’ or ‘tip’ – and executing tests against that. So you don’t need to know which ‘version’ (‘revision’ is a more accurate word from the point of view of the version control system) it is – you can just refer to it as the latest version. In Subversion, this is done by ending the URL with ‘/trunk‘.
The only other revisions that you are likely to build are the latest versions on a particular branch. Again, this can be done without knowing the revision number.
You would not need to build any tagged/released version again – you could just go and get the binary from the binary repository. Although you could easily build it again if you needed to by referring to the tag name in the URL.
And all of the other revisions are essentially old, discarded points in time that you have moved on from.
So you do need to know which tag relates to which binary version, and the easiest way to do this is to just use the binary version number in the tag. So, for example, you might tag revision 126 as ‘VERSION-2.3‘. If you needed to come back later and look at that version, you could just end your Subversion URL with /tags/VERSION-2.3.
That brings us to branching…
Keeping a log of all the changes over time is all very well and good, but projects don’t always follow a straight line. What happens when you do find the problem in your old version 2.3, the one you have in production, but you have already done a heap of work on version 2.4?
This is one thing that branches are good for. A branch lets you start a parallel stream of work from a given point in time (like a tag, but it can be from any revision). Consider the following diagram:
Here we can see that development of the next version can continue on the trunk, while another team work on fixing the production bug in version 2.3 in the branch. The two have no effect on each other. We can build either the trunk or the branch, depending on what we want to do.
When you finish fixing the bug, let’s say that is with revision 131. You should tag that one too, so you have another tag to go back to in case you find a bug in that ‘fixed’ version.
The other useful thing about branches, is that you can merge them back into the trunk (or any other branch for that matter). This provides an ideal way to isolate potentially dangerous work. If you are trying something new, and you are not sure how it is going to work out, you can do that work in a branch. If it proves to be ok, you can merge the changes in that branch back into the trunk at some point in the future. But if it is not ok, you can just forget the branch and go on as if it never happened.
A note on merging
One really important thing to know is that you can merge backwards. This lets you essentially get rid of a bad commit. This is definitely something that you should learn how to do in your version control system.
When to create a version
Here is another meaningful question – how often should developers commit their changes to the trunk?
Later on we are going to talk about continuous integration. The word ‘integration’ in ‘continuous integration’ refers to the process of merging changes into a trunk. One of the foundation principles of continuous integration is that all developers must commit regularly to the trunk. What does regularly mean? Well no less frequently than once every day.
This drives some desirable behaviors. How much meaningful work can a developer do in a day? Not much right? Right! That’s the whole idea! By forcing developers to work in small iterations, we are minimizing the size of potential integration issues that can occur. Less code to integrate – less serious problems integrating. Less serious problems, easier to fix. Less problems in the codebase at any given point in time – better quality! That is what continuous integration is aiming to achieve.
When we are using continuous integration, we build every time a developer commits to the trunk (or to a branch). But now the question arises – do we really want to build all of the developer’s little intermediate commits, that we know are not going to work anyway, since they are work in progress?
There are two ways we are going to talk about (in future articles) to address this:
- Many continuous integration servers support the notion of a pre-flight build. This is a build that is triggered by the developer’s work in progress commit. The sole purpose of this build is to see if the have broken anything. It does not need to go all the way through the build process, running all of the integration tests, getting ready for release. It is never going to be released. This gives the developer the freedom to experiment and check the results without bogging down the continuous integration system wasting a whole bunch of time and energy on builds that don’t matter.
- The other option comes with distributed version control. Here the developer can commit as many times as they like, but the commits will not flow through to the main repository – the one the continuous integration server is watching – until the developer does a ‘push’. When the continuous integration server sees a whole bunch of new commits, it is (usually) smart enough to just pick up the newest one and build that. This approach includes an implicit assumption that the developer is able to build and test the software on their own environment before pushing.
More about version control
At this point, it starts to become difficult to talk about some of the strategies I want to discuss before we have moved forward a few more steps and talked about a few more topics, especially continuous integration . So let’s put version control on hold, and come back to it later.