git notes main page | gitolite main page | license

IMPORTANT NOTE: although this page has a "gitolite.com" URL, this is not about gitolite. That's just an artifact of "sitaramc.github.com" being translated to "gitolite.com" and so ALL my git related stuff gets carried over. Gitolite documentation has another /gitolite in the URL, so you can tell. My apologies for this confusion.

git as a deployment tool

One of the first things people want to do with a server side repo is to automatically deploy something when someone pushes to it.

And one of the first things they hear when they ask about it on #git is “git is not a deployment tool, so don’t do that” ;-)

So let’s talk deployment…

1 introduction

Deploying on a push to a bare repo should be easy. Just add a post-receive hook that contains this code:

#!/bin/sh
some-magic-deploy-command /deploy/directory

Opinions differ on what that magic deploy command should actually be, and indeed whether there should even be one at all :-)

2 deployment rules

Here’s what we expect from a deployment tool. Note the rule numbers – we’ll be referring to some of them simply by number later.

All files in the branch being deployed should be copied to the deployment directory.
Files that were deleted in the git repo since the last deployment should get deleted from the deployment directory.
Any changes to tracked files in the deployment directory after the last deployment should be ignored when following rules 1 and 2.

However, sometimes you might want to detect such changes and abort if you found any.
Untracked files in the deploy directory should be left alone.

Again, some people might want to detect this and abort the deployment.

NOTE: if you need more than this, you should seriously consider a proper deployment tool. The best one I have heard of (but not used, since I don’t need that level of control) is git-deploy. It’s educational to at least be aware of what is possible with a real tool, even if your current situation does not require it.

3 why git is not a deployment tool

The most important reason is this: git does not track permissions (other than the first “x” in “rwxrwxrwx”). Many real-world deployments need a bit more granularity than that, I suspect.

Also, git does not track empty directories, which could also be a problem (albeit less common than permissions).

There could be other reasons; I’ll update this section as I hear of any but I’m sure those are the main ones.

4 6 different ways to deploy

We’ll discuss these 6 ways of deploying your project using git. Note that none of these methods can overcome git’s inherent limitations for deployment (permissions, etc., as described above), though any of them can be followed up with extra code to do that, using some site-local script.

Also note that none of them involve git pull, which a naive user might think is worth trying, but in fact has absolutely no place in an automated setup or script of any kind. (If you don’t know why, the short answer is “pull = fetch + merge” and merges cannot be guaranteed to complete without manual fixups).

4.1 checkout – the “fight club” of git deployment

Summary

export GIT_WORK_TREE=/deploy/dir
git checkout -f master

Pros: satisfies all 4 of our rules.

Cons: you don’t talk about it or ask questions about it ;-) Jokes apart, see discussion below.

Discussion

If you want to use this method, you have to make sure:

you always checkout to the same deploy directory
you always checkout the same branch
you always use “-f”

But it’s hard to explain why. (It has to do with HEAD, the index, and the actual work tree getting out of sync in ways that won’t ever happen on a normal repo using normal git commands).

So if you run into trouble, it becomes a game of trying to dig out of you what exactly you did in what sequence. And if you think that’s easy you’ve never done tech support :)

Aborting on changes to tracked files is easy. Before the checkout, run:

git diff --quiet || exit 1

Aborting on finding untracked files is also easy enough:

git ls-files -o | grep >/dev/null . && exit 1

4.2 fetch – reverse the flow

Summary

cd /deploy/dir
unset GIT_DIR
git fetch origin
git reset --hard origin/master

Pros: satisfies all 4 rules

Cons: your deploy directory needs to be a clone of the bare repo.

Discussion

If your deploy directory is OK with having a “.git” sitting in it (i.e., whatever application you have won’t barf because of it), this is a simple way of reversing the flow.

A lot of people try this; they fail because they don’t realise git sets up an extra environment variable (because it expects to be running in a bare repo) that you need to unset when you cross over into a non-bare repo.

Aborting on finding local changes is the same as in the previous section. Run those commands just before the fetch.

Security: a readable “.git” may be a security issue in some cases.

4.3 reset – another reversing trick

Summary

GIT_DIR=$PWD
cd /deploy/dir
export GIT_WORK_TREE=.
git reset --hard

Pros: satisfies all 4 rules

Cons: newbie-unsafe; see below.

Discussion

This code deploys whatever HEAD points to, as it doesn’t explicitly name a branch.

A newbie may be tempted to fix that:

git reset --hard master     # DONT DO THIS!

In a word, DON’T! If you did not change HEAD you do not need it. And if you did change HEAD to something other than master you’d lose that branch! (Didn’t see that coming did ya?)

If you never change HEAD and no one in their right mind in your site would, you can use this. But do you really want to take that chance?

Finally, it has no advantage over the first method in any way.

4.4 direct archive dump

Summary

git archive master | tar -C /deploy/dir -xf -

Pros: clean and simple

Cons: doesn’t satisfy rule 2 (deleted files don’t get deleted).

Discussion

Pretty useless, since it violates rule 2. If you prefer to sacrifice rule 4 instead of rule 2, see the next section!

It’s also inefficient for large repos if only a few files have changed.

4.5 archive dump with staging

Summary

git archive master | tar -C /deploy/dir.new -xf -
mv /deploy/dir      /deploy/dir.old
mv /deploy/dir.new  /deploy/dir
rm -rf              /deploy/dir.old

Pros: clean and simple, using only a tar file and shell commands.

Cons: sacrifices rule 4.

Discussion

This method uses a staging directory between the repo and the deploy directory. It is best used if you need to make any adjustments before the actual files are copied over.

While still inefficient (in terms of disk writes compared to number of changed files), this method compensates for the inefficiency by allowing almost-atomic switchover if you put the staging directory on the same filesystem.

Now, while I would consider a two-mv command sequence as atomic enough, there’ll always be nitpickers who insist that “almost” isn’t good enough. Patches welcome :-)

(IMO, if you’re worrying about that level of atomicity in these days of “eventually consistent” databases you should be writing this document not reading it!)

Security: You also need to worry about security for the temp dir if untrusted people have local shell access to the server. Consider using mktemp and umask.

Aborting on finding local changes is not easy. You need to checkout the old commit somewhere and compare files manually. If this is important, use one of the earlier methods.

4.6 archive dump with staging, using rsync

This is a variation of the previous one, except that your deploy directory is on a different host, or on a different user on the same server. The git user needs ssh access to the deploy user, and you use rsync to copy the files.

You no longer get atomic switchover, but rsync’s network efficiency is probably good enough. Maybe not as instantaneous as two rename commands (or a symlink switch) but not too bad.

EugeneKay has written a very comprehensive script that uses this technique, but adds many more bells and whistles; find it here.

Security: If you don’t want your git user to have full access to your deploy user, you need to restrict what it can do using ‘rrsync’, a script that comes with rsync. This is left as an exercise for the reader :-)

5 summary/recap

We’ve seen 6 methods. Here’s my summary. The ones in gray are considered not useful enough to be in the running.

checkout – this is the one I prefer, in fact I use it in gitolite. It’s very efficient (only changed files are written, as normal in git) and newbie friendly enough. The constraints imposed are simple to understand and easy to stick to.

In addition, it’s easy to detect local changes and abort. (This is equally true for the next 2 methods also).
fetch – this is pretty intuitive even for git-newbies (while also remaining efficient), but your deploy directory needs to be a clone, which could be a problem.
reset – highly newbie-unsafe. No particular advantage over the “checkout” method anyway.
direct archive dump – this is the only one that doesn’t satisfy rule 2, so it’s kinda useless :(
archive dump with staging – the absolute simplest in terms of git concepts. It’s inefficient but compensates by allowing atomic switchovers.
archive dump with staging using rsync – very useful if your deploy dir is on a different server, or on a different user on the same server.