git notes main page | gitolite main page | license
IMPORTANT NOTE:
although this page has a "gitolite.com" URL, this is not about gitolite.
That's just an artifact of "sitaramc.github.com" being translated to
"gitolite.com" and so ALL my git related stuff gets carried over.
Gitolite documentation has another /gitolite
in the URL, so
you can tell. My apologies for this confusion.
CVS/SVN folks – please read this first
SVN and CVS treat a branch as another subdirectory of your project. Conversely, you can also pretend a subdirectory is a branch. In other words, branches (and tags) share the same namespace as the subdirectories of your project.
This is not the right way to think of branches. A branch is merely an alternate reality for your project. That’s the simplest way to put it. Each commit is a “snapshot” of the state of your code at a certain point in time, and each branch (consisting of a series of commits) represents a different timeline than the other branches do.
Thus, branches and subdirectories have nothing to do with each other.
The git repository is really just a DAG (directed acyclic graph) of commits. Each commits points to its parent or parents (merge commits have more than one parent). A branch in git is merely a pointer to some commit in the DAG.
When you checkout a branch, git simply makes your working tree look like the files that existed when that commit was made, and makes a note that this is your “current” branch. When you make a commit on top of this, git updates this current branch to point to the new “head” commit. This is conceptually the same as adding a new item to a linked list and moving the “head” pointer to the new item, if you remember your data structures course from many years ago.
A developer’s repository will typically have several branches residing in it, although of course only one is “checked out” at any given time into the working tree.
Apart from local branches (like master
, next
, and maybe v1.2
, some-new-feature
), etc., you will also have a set of branches called origin/branchname (example: origin/master
). These are your copy of the corresponding branch from the “origin” server. You should never commit to them directly.
And the most important thing is, you can have any other branches you wish! Go wild – no one will stop you, and it’s quite difficult to lose data so have fun!
As you read about workflows in modern VCSs, you’ll often hear about topic branches. According to the git glossary, a topic branch is:
A regular git branch that is used by a developer to identify a conceptual line of development. Since branches are very easy and inexpensive, it is often desirable to have several small branches that each contain very well defined concepts or small incremental yet related changes.
A topic branch could represent any self-contained, pretty much independent-of-others, line of development. A specific bug, a new feature, a refactor, a set of enhancements to a core feature, etc., can all be managed as topics in their own right.
Let’s say your version tree looks like this:
V3---o---o---o---o---o---V4---o---o--- (master)
\ \
\---o---o--- (V3maint) \
\ \---o---o--- (V4maint)
\-o--- (topicA) \
\-o--- (topicB)
In this picture, V3 and V4 are tags representing specific versions. V3maint is a maintenance branch that collects ongoing fixes for V3. These ongoing fixes could come from commits within V3maint itself, or by integrating topic branches such as topicA. At various points in it’s lifecycle, when things are stable, the top commit on V3maint will be tagged V3.1, V3.2, etc., as needed.
(…all this applies equivalently to V4 also, of course…)
The “master” branch is where on-going development for the future V5 is happening. This “new” development may well have its own set of topic branches, probably by feature name, but right now we’ll concentrate on the maintenance aspect of fixing bugs or making enhancements to the released versions.
The simplest way to manage the flow of patches and fixes is this:
when a change is small enough to not need a topic branch, commit it on the oldest branch that needs the change. In this example, that could be the V3maint or V4maint branch, depending on whether the issue affects V3 or not.
Note that you cannot commit on V3. V3 is a tag – a tag represents a “fixed point” in the commit hierarchy – it’s value can’t change. You can only commit on a “branch”. In C terms, a tag is a const
:-)
Also, note that although V3maint is primarily an integration branch for the topic branches pertaining to V3, in this case it is acting as a “catchall” topic branch for commits that are too small to have their own topics.
for larger changes, create a topic branch by making a fork at the oldest branch that needs this change. For a bug that is applicable to V4 but not to V3, you’d fork a topic branch at V4. On the other hand, if the bug applies to V3 also, you’d fork at V3.
Note that you’re not forking at V3maint or V4maint; you’re forking at V3 or V4. See discussion in the previous bullet about V3maint’s role.
However, it may sometimes happen that the bug does not apply to V3 but only to V3maint (perhaps it’s a bug that was caused by some previous fix within the V3maint branch). In that case, you must commit/fork it at the current tip of V3maint.
Periodically merge the older branches into newer ones so the new branches get the benefit of accumulated fixes and enhancements on the older branches. This means, for instance, that we merge V3maint, as well as topic branches that forked from V3 (like topicA), into V4maint, and then V4maint, (plus topics like topicB) into “master”, which is the future V5, etc.
The exception to these rules is if you have bugs on older versions that do not apply to newer versions. Merging fixes that are relevant only to V3, into the V4 branch, is meaningless, and may even get you conflicts. The way to deal with those is as follows:
Note that V3only will never get merged into V4maint or V3maint, preserving the condition that commits made in V3only are not picked up by the newer versions.
In all the above discussion, we kept saying “merge X into Y”. Which is all very well if things go fine, but how do you back out if something went wrong?
The answer is something called a “throwaway integration branch”.
We’ll use an analogy to understand this. Let’s say you’re writing a function to insert a bunch of items into a linked list. And let’s say it’s a complex function, and at the start, you’re not sure if it’s going to succeed, and you want the “head” pointer to be updated only if the whole thing succeeds.
How would you deal with this? You’d make a temporary copy of the “head” pointer, use that in your function, and if the function returns a success value, you’d update the real “head” pointer. Conversely, if the function returned an error, you would leave your real “head” pointer as it is and just discard the temporary value.
Git makes creating a new branch just as easy. So instead of doing this:
git checkout V4maint
git merge V3maint # what if merge fails?
make test # what if merge succeeds but tests fail?
where if either of the last 2 steps fail, you have to figure out how to get back to the old V4, you do this instead:
git branch -f test V4maint # forcibly set test branch to V4
git checkout test # and check it out
git merge V3maint
make test
# all good?
git branch -f V4maint test # forcibly set V4maint to test
And of course if things didn’t work out you would not do that last step, preserving your V4maint at its previous revision.