git notes main page | gitolite main page | license

IMPORTANT NOTE: although this page has a "gitolite.com" URL, this is not about gitolite. That's just an artifact of "sitaramc.github.com" being translated to "gitolite.com" and so ALL my git related stuff gets carried over. Gitolite documentation has another /gitolite in the URL, so you can tell. My apologies for this confusion.

terminology

1 general VCS terminology

A project is the minimum set of source code (and related files) that need to be kept together to build the software. Example: Linux

each project will have one repository
one team can work on multiple projects, so there could be multiple repositories on each desktop

A working tree or worktree is the current set of files that are being worked on, tested, etc.

A branch in a project is an active line of development

master is the conventional name for the main development tree of a project
other conventional branches are next (for code that is ready to come into master), and various maintenance branches like v1.3 or v2.6.4 to designate released versions
these are only conventions, not rules, but they seem to work well in general
more detail about branches is in the next section

A feature is a part of a project that is large and complex enough that it’s day-to-day commits would be too noisy to include in the main project. Example: the disk subsystem, the networking subsystem, etc., in Linux

a feature branch is a branch for a feature, and is usually long-lived. This means it regularly acquires changes made in the main line, and – at stable points in its development cycle – merges its changes back into the main line

2 git-specific terminology

(By the way, git concepts simplified contains a much more extensive treatment of some of the following concepts, with lots of diagrams to help).

2.1 working tree, repository, bare repository

a working tree in git is the same as anywhere else. When you edit, compile, test, etc., it is your working tree files that are used
a normal repository in git consists of the working tree, plus one extra directory at the root of the project tree (meaning at the top directory of your project), called .git. Inside this .git are a bunch of files and directories which you need not worry about (and should not mess with unless you know what you’re doing)
a bare repository in git is a very special animal: it is a repository without a working tree (so you cannot edit, compile, test etc., in it). Central (or server) repos, meant only for people to push to and clone/pull from, are usually bare.

2.2 commits, SHAs

a commit records the content of the working tree at a point in time
every commit in git is represented by a globally unique 160-bit value (40 hex digits) which is a cryptographically secure hash of the commit data. This is often called the SHA or SHA-1, after the algorithm used.

.@gray<[“Globally unique” means no other commit in any other repository in any other project anywhere in the world will have the same SHA. You can become very, very, famous in the cryptography world if you find two different commits with the same SHA :-)]>@

2.3 parent commits, commit tree

most commits build on top of one or more previous commits, called parents. Normal commits have only one parent, while “merge” commits have two. The very first commit in the tree has no parents; this is called a “root commit”.

.@gray<[A repo can have more than one “root commit”, but this is rare.]>@
every commit records the SHAs of its parent(s). It’s as if every commit has an arrow pointing to its parents, like a back pointer in a linked list
- thus, merge commits have two arrows pointing out
- and a “fork” (where you created a new branch) has two or more arrows pointing in, because it’s the parent for more than one commit
as a result, the entire repo can look like a tree. Strictly speaking, this is a DAG – or “directed acyclic graph”.

2.4 head, branch, tag

a branch represents an active line of development; you use different branches to track different lines of development. Without branches, working on multiple features or versions simultaneously would be a nightmare!
in practice, a branch is a symbolic name that points to the most recent commit on that line of development. This “tip” commit is called the head of that branch.

When you checkout a branch and make a new commit on it, the “branch” now points to the new head node. In other words, a branch is a pointer that moves as you make commits.

A repository can track many branches, but the working tree is associated with at most one branch at a time.
a tag is also a symbolic name given to a commit. Tags don’t “move” as you make new commits – they stay where they are.

Tags are used to mark specific milestones in a project, like “v1.3”, or “v2.9”, etc.

2.5 reachability

you can start from a commit X and follow the parent SHAs (“arrows”) until you come to a root commit. The commits you found in this process are all said to be reachable from X
if you have commit X in some other repo, git guarantees that that repo also has all these “reachable” commits.

.@gray<[If you know about grafts or shallow clones, you’re too advanced to be reading this document ;-)]>@
when you merge a branch into the currently checked out branch, git uses reachability to determine what commits to actually bring in. All the commits reachable from the branch being merged, which are not already reachable from the current branch, are pulled in and applied

2.6 master, HEAD

master: default branch in a project (main development tree), by convention
HEAD: tip of the branch associated with the working tree; this is where commits go. Normally. There is also something called a ‘detached HEAD’ that you should be aware of. See the article on the detached HEAD for more.

2.7 index

a staging area for the next commit; when you commit, the current index is turned into a real commit object
the index is also called stage or cache
staging a file (git add file) marks its current contents for inclusion in the next commit
unstaging a file (git reset file, or in the case of a new file, git rm --cached file) undoes all staging since the last commit
the unstaged changes are the difference between index and working tree (git diff)
the uncommitted but staged changes are the difference between HEAD and index (git diff --cached)
see uses of index for the cool things you can do with it

3.1 clone versus checkout

In a distributed VCS, every developer’s workstation has a full copy of the entire repository. It is therefore called a clone – you clone the remote repository, you don’t merely checkout the latest version from it. (Of course, you can have several clones on the same workstation if you wish).

So where does checkout come in? In git, you checkout a branch from your local repo, so this happens after the clone. Remember this is a full repo, so you already have all the branches that the parent repo had when you cloned.

By default, after the clone is done, git will checkout the same branch that was currently checked out in the repository you’re cloning from. However, you can checkout some other branch at any time (and if you’re using git, you will checkout and manage multiple branches on your local repo all the time, otherwise you’re not really using git!)

3.2 accessing remote repositories

Every remote repository (often called just a remote) has a URL associated with it, which tells your (local) git client how to reach it. There are typically 4 kinds of Git URLs:

ssh: like ssh://[user@]host.xz[:port]/path/to/repo.git/
http: like http[s]://host.xz[:port]/path/to/repo.git/
git: like git://host.xz[:port]/path/to/repo.git/ – note that this is an unauthenticated protocol suitable only for allowing downloads of open source or similar software
local file: like file:///full/path/to/reponame

See ‘man git-clone’ for all the allowed syntaxes for git URLs.

3.3 naming remote repositories

You can refer to, (or fetch from, and push to) more than one remote repo in your clone, giving them all different URLs. After a while it gets inconvenient to use the full URLs in your git fetch and git push commands, so git allows you to give an easy to remember “nickname” for each “remote”. I could, for instance, do this:

git remote add sejal ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git

After this I could refer to the longer URL by the shortname “sejal” in most git commands and it would be the same thing.

(Note: for convenience, a ‘remote’ called ‘origin’ is automatically created when you clone a repo, pointing to the repo you cloned from.)

3.3.1 origin

So now you know what a nickname is, you can understand what “origin” means. “origin” is just the default name given to the remote repository when you do a git clone. So for example, if I do

git clone ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git

then, after the clone completes, git automatically creates a remote with the nickname origin, which points to ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git. It’s just a convenience thing. You can delete that nickname, you can rename it to something else, etc., if you like.