In two previous parts, we dealt with Git objects that were of compressed files form. That might be a bit overwhelming, so today we will work mostly with plain text files. Because branches, tags, refs, in general, are just files. Like almost everything in Git.
Last time we lost our two newly created commits, as they were „loose objects”, not attached to anything. But what does it mean – attached? Let’s create commits again
(the tree we had are still there) and try to make them full-fledged items.
#We have two trees that we would like to commit. So we take their IDs and write as commits. git commit-tree 72b177c6a7d277b82751ec26e00f356db571f4bd -m 'First commit' #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd git commit-tree 9aa6b0a47fe2d85baffaedd4f33b4db8aa89ad4d -m 'Second commit' #7cb50860d4f0edae6b584aee195e243e2cf6145a git log #fatal: your current branch 'master' does not have any commits yet git log 7cb5 --pretty=oneline #7cb50860d4f0edae6b584aee195e243e2cf6145a Second commit #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd First commit
The situation is the same as before. We have two commits, the first is the parent of the second, but we cannot just use git log to see the history, as master doesn’t have any commits yet. How come, if we committed our work, where the commits are?
They exist as the files in the .git/objects folder and can be accessed by name. That’s why git log 7cb5 works. But we would like them to be accessible in a more friendly way.
THE ULTIMATE DIAGRAM OF ALMOST EVERYTHING IN GIT
Please take a look at the diagram above. It shows the structure of the files inside the .git folder. I hope it’s clear and let you understand how elegant this whole system is. It should also put some light on the topics I’d like to describe in this post – branches, tags and remotes. They are all simple files, and we will examine them all 🙂
If the master doesn’t have any commits yet, but the commits exist, then maybe we will connect them to the master branch? It’s very simple. All we have to do is to create a file named master inside the .git/refs/heads folder. As simple as that:
echo 7cb50860d4f0edae6b584aee195e243e2cf6145a &gt; refs/heads/master git log --pretty=oneline #7cb50860d4f0edae6b584aee195e243e2cf6145a (HEAD -&gt; master) Second commit #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd First commit cat refs/heads/master #7cb50860d4f0edae6b584aee195e243e2cf6145a
All we had to do was to write the ID of the latest commit into the file .git/refs/heads/master Note that I printed full ID into refs/heads/master as I was inside the .git folder then. Git now knows that the master points to the specific commit and everything works smoothly now. Commit hierarchy is connected to the main branch.
As you can imagine, writing hashes is not what we would like to do, so Git provides us with the helper command git update-ref. It has at least two advantages – no need to provide full ID there, only a unique part is required. The second advantage is that the command performs validation, so there is no (simple) way to pass the wrong ID.
Let’s write master once again using this command:
rm refs/heads/master #delete master file git update-ref refs/heads/master fff #fatal: fff: not a valid SHA1 git update-ref refs/heads/master 7cb5 cat refs/heads/master #7cb50860d4f0edae6b584aee195e243e2cf6145a
In order to keep our work aligned with the ultimate diagram, let’s create another branch – Feature1 pointing to the first commit.
git update-ref refs/heads/Feature1 1308 cat refs/heads/feature1 #note that we write all using small letters, doesn't matter #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd
So, branch again is a file. A simple text that points to the specific commit. That’s why the creation of a branch is so fast in Git. When you create a new branch, you run update-ref command pointing to the current commit.
Concept of the current commit – HEAD
But how Git knows the current commit? .git/HEAD file is the answer. It contains text like ref: refs/heads/master. Which of course points to the master file that we’ve just created. If the HEAD is empty, Git stops treating the repository as the valid one and produces error messages: fatal: not a git repository (or any of the parent directories): .git
So it’s better not to mess with this file 🙂 When we check out a branch, its content changes. Let’s check:
cat HEAD #ref: refs/heads/master cd .. git checkout Feature1 #Switched to branch 'Feature1' cat .git/HEAD #ref: refs/heads/Feature1
Indeed, each time we switch a branch, HEAD’s content is changed. As you suspect, instead of git checkout we can also just rewrite the file. And also here Git offers a helper command: git symbolic-ref that prevents from pointing to anything outside .git/refs folder.
So, looking at our ultimate diagram, we know a little bit more about HEAD, refs/heads and objects folders. Now we will talk about tags, stored in (surprise:)) refs/tags directory.
Tags are useful helpers that allow naming commits in a friendly manner. There are two types of tags.
Which are just nicely named references, very similar to the branches:
#we can also create them just like the branches git update-ref refs/tags/InitialCommit 1308c cat .git/refs/tags/InitialCommit #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd git checkout InitialCommit #Note: switching to 'InitialCommit'. #You are in 'detached HEAD' state. cat .git/HEAD #1308ccd2087d1a02bc7cd747bf0bed2a78d30ccd
We created a lightweight tag, switched to it and made our HEAD in a „detached state”.
The attached state means that HEAD points to the latest commit on the branch. Detached is the opposite – where it points to one of the previous commits, as we have now (see comments). We made HEAD pointing to the First commit, and when we run git checkout Feature1 which points exactly to the same commit 1308c, HEAD will return to the attached state. Maybe a little bit strange, but at least now we know what these two terms exactly mean.
… and annotated
Which are rather like commits that point not to the trees, but other commits. Let’s try:
git tag -a BreakingChange 7cb5 -m 'Tagging breaking change' cat .git/refs/tags/BreakingChange #615be0719f0c78b5c9fe47ed60fe3bbd9886b2e8 git cat-file -p 615be #object 7cb50860d4f0edae6b584aee195e243e2cf6145a #type commit #tag BreakingChange #tagger Pawel Szczygielski <pawel.szczygielski@> 1614340432 +0100
So, when we created the annotated tag, inside .git/objects new item has been also created. Our tag point to that item, which contains more data about the tagger, date and of course another reference to the commit (again, check the ultimate diagram to see these connections).
Interesting thing is that not only commits can be tagged. We can easily tag blobs, other tags and trees:
git tag -a Tree 72b1 -m 'Tagging tree' cat .git/refs/tags/Tree #c5f16d44bce7845e69954e6be91b3e50cf091536 git cat-file -p c5f16 #object 72b177c6a7d277b82751ec26e00f356db571f4bd #type tree #tag Tree #tagger Pawel Szczygielski
I haven’t tagged anything else than commit yet, and even though I can see usage for tagging blobs – to indicate key resources in our repository for example – in general, I noticed that people rather discourage from tagging anything else than commits.
I planned to write this in a separate post, but ok – this is another kind of reference, just like tag or branch, so it should be here. First, we will set up a remote repository. To make it simpler, let’s create a remote on the same machine, in another sibling folder.
#we're in main repository folder cd .. #go one level up mkdir BareRepo #create new branch for the 'remote' repo cd BareRepo #go there git init --bare BareRepo.git #initialize bare repo #Initialized empty Git repository in C:/Temp/BareRepo/BareRepo.git/ cd ../TestRepo #back in our repository
We’re back in our repo, now we will connect the „remote” bare repository („bare” means, shortly speaking, that it contains only the .git folder and its objects, not the code itself. Of course the topic is more complex, but this explanation is enough for our purposes).
Now we will add BareRepo as our remote repo and we will push our code to it.
git remote add origin ../BareRepo/BareRepo.git git push origin master #Enumerating objects: 6, done. #Counting objects: 100% (6/6), done. #Delta compression using up to 8 threads #Compressing objects: 100% (3/3), done. #Writing objects: 100% (6/6), 472 bytes | 157.00 KiB/s, done. #Total 6 (delta 1), reused 0 (delta 0), pack-reused 0 #To ../BareRepo/BareRepo.git #* [new branch] master -> master
When we examine BareRepo we will see that it contains our tree and blob objects. And in our base repo, a new folder appeared: .git\refs\remotes\origin containing one file:
cat .git/refs/remotes/origin/master #7cb50860d4f0edae6b584aee195e243e2cf6145a
On the ultimate diagram, we can see that this file contains the ID of our latest commit (Second commit). It’s because we have to know what was the latest known state of the remote. This master file holds information about it. Of course, we could also push our Feature1 branch:
git push origin Feature1 #Total 0 (delta 0), reused 0 (delta 0), pack-reused 0 #To ../BareRepo/BareRepo.git #* [new branch] Feature1 -> Feature1 git push origin Feature1 #Everything up-to-date
When we try to push the same content again, git knows that Everything is up-to-date. How? Of course, because .git\refs\remotes\origin\Feature1 file exists and holds information about remote state. When we change our Feature1, then we’ll be able to push our changes. We all know this mechanism, we use it in our daily work. But now we understand how it works (hopefully, at least a general outline…).