git notes main page | gitolite main page | license
IMPORTANT NOTE:
although this page has a "gitolite.com" URL, this is not about gitolite.
That's just an artifact of "sitaramc.github.com" being translated to
"gitolite.com" and so ALL my git related stuff gets carried over.
Gitolite documentation has another /gitolite
in the URL, so
you can tell. My apologies for this confusion.
A project is the minimum set of source code (and related files) that need to be kept together to build the software. Example: Linux
A working tree or worktree is the current set of files that are being worked on, tested, etc.
A branch in a project is an active line of development
A feature is a part of a project that is large and complex enough that it’s day-to-day commits would be too noisy to include in the main project. Example: the disk subsystem, the networking subsystem, etc., in Linux
(By the way, git concepts simplified contains a much more extensive treatment of some of the following concepts, with lots of diagrams to help).
a working tree in git is the same as anywhere else. When you edit, compile, test, etc., it is your working tree files that are used
a normal repository in git consists of the working tree, plus one extra directory at the root of the project tree (meaning at the top directory of your project), called .git
. Inside this .git
are a bunch of files and directories which you need not worry about (and should not mess with unless you know what you’re doing)
a bare repository in git is a very special animal: it is a repository without a working tree (so you cannot edit, compile, test etc., in it). Central (or server) repos, meant only for people to push to and clone/pull from, are usually bare.
a commit records the content of the working tree at a point in time
every commit in git is represented by a globally unique 160-bit value (40 hex digits) which is a cryptographically secure hash of the commit data. This is often called the SHA or SHA-1, after the algorithm used.
.@gray<[“Globally unique” means no other commit in any other repository in any other project anywhere in the world will have the same SHA. You can become very, very, famous in the cryptography world if you find two different commits with the same SHA :-)]>@
most commits build on top of one or more previous commits, called parents. Normal commits have only one parent, while “merge” commits have two. The very first commit in the tree has no parents; this is called a “root commit”.
.@gray<[A repo can have more than one “root commit”, but this is rare.]>@
every commit records the SHAs of its parent(s). It’s as if every commit has an arrow pointing to its parents, like a back pointer in a linked list
as a result, the entire repo can look like a tree. Strictly speaking, this is a DAG – or “directed acyclic graph”.
a branch represents an active line of development; you use different branches to track different lines of development. Without branches, working on multiple features or versions simultaneously would be a nightmare!
in practice, a branch is a symbolic name that points to the most recent commit on that line of development. This “tip” commit is called the head of that branch.
When you checkout a branch and make a new commit on it, the “branch” now points to the new head node. In other words, a branch is a pointer that moves as you make commits.
A repository can track many branches, but the working tree is associated with at most one branch at a time.
a tag is also a symbolic name given to a commit. Tags don’t “move” as you make new commits – they stay where they are.
Tags are used to mark specific milestones in a project, like “v1.3”, or “v2.9”, etc.
you can start from a commit X and follow the parent SHAs (“arrows”) until you come to a root commit. The commits you found in this process are all said to be reachable from X
if you have commit X in some other repo, git guarantees that that repo also has all these “reachable” commits.
.@gray<[If you know about grafts or shallow clones, you’re too advanced to be reading this document ;-)]>@
when you merge a branch into the currently checked out branch, git uses reachability to determine what commits to actually bring in. All the commits reachable from the branch being merged, which are not already reachable from the current branch, are pulled in and applied
a staging area for the next commit; when you commit, the current index is turned into a real commit object
the index is also called stage or cache
staging a file (git add file
) marks its current contents for inclusion in the next commit
unstaging a file (git reset file
, or in the case of a new file, git rm --cached file
) undoes all staging since the last commit
the unstaged changes are the difference between index and working tree (git diff
)
the uncommitted but staged changes are the difference between HEAD and index (git diff --cached
)
see uses of index for the cool things you can do with it
In a distributed VCS, every developer’s workstation has a full copy of the entire repository. It is therefore called a clone – you clone the remote repository, you don’t merely checkout the latest version from it. (Of course, you can have several clones on the same workstation if you wish).
So where does checkout come in? In git, you checkout a branch from your local repo, so this happens after the clone. Remember this is a full repo, so you already have all the branches that the parent repo had when you cloned.
By default, after the clone is done, git will checkout the same branch that was currently checked out in the repository you’re cloning from. However, you can checkout some other branch at any time (and if you’re using git, you will checkout and manage multiple branches on your local repo all the time, otherwise you’re not really using git!)
Every remote repository (often called just a remote) has a URL associated with it, which tells your (local) git client how to reach it. There are typically 4 kinds of Git URLs:
ssh://[user@]host.xz[:port]/path/to/repo.git/
http[s]://host.xz[:port]/path/to/repo.git/
git://host.xz[:port]/path/to/repo.git/
– note that this is an unauthenticated protocol suitable only for allowing downloads of open source or similar softwarefile:///full/path/to/reponame
See ‘man git-clone’ for all the allowed syntaxes for git URLs.
You can refer to, (or fetch from, and push to) more than one remote repo in your clone, giving them all different URLs. After a while it gets inconvenient to use the full URLs in your git fetch
and git push
commands, so git allows you to give an easy to remember “nickname” for each “remote”. I could, for instance, do this:
git remote add sejal ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git
After this I could refer to the longer URL by the shortname “sejal” in most git commands and it would be the same thing.
(Note: for convenience, a ‘remote’ called ‘origin’ is automatically created when you clone a repo, pointing to the repo you cloned from.)
So now you know what a nickname is, you can understand what “origin” means. “origin” is just the default name given to the remote repository when you do a git clone
. So for example, if I do
git clone ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git
then, after the clone completes, git automatically creates a remote with the nickname origin
, which points to ssh://sitaram@sejal.herlab.ourcompany.com/path/to/repo.git
. It’s just a convenience thing. You can delete that nickname, you can rename it to something else, etc., if you like.