Last time we learnt how Git stores the data inside its file database. But how to commit something? This post shows how git commit works internally.
For this lesson, we don’t need very complex preparations. A clean repository is enough.
git init TestRepo #initializes repository in the folder TestRepo
As we remember this will create a hidden .git folder containing almost no data. At least .git/objects has no files inside.
In the previous post, the folder contained more items. That’s because I examined it with SmartGit tool, which added some items on its own.
Now, we see only three files.
- description – containing only the phrase „Unnamed repository; edit this file 'description’ to name the repository.„. Not relevant for our purposes
- config – with an initial configuration of our repository. Not relevant now
- HEAD – containing the phrase: „ref: refs/heads/master„. We probably suspect that it stores a pointer to the branch. But when we go to .git/refs/heads/master folder, we’ll see that it’s empty. Yet.
Let’s play with blobs
We will add a file now… but not in the usual way – using Windows Explorer, IDE or notepad. We will add it directly under Git’s hood.
echo 'Hello world!' | git hash-object -w --stdin #cd0875583aabe89ee197ea133980a9085d08e497
What did we do? Echo statement prints Hello world!, pipeline transfers it to git hash-object that hashes it in the way I described last time. Switch -w makes the object written in the database. Without –stdin Git would wait for a path to the file to be hashed. With the switch, it will hash console stream (in other words, Hello world that we’ve just put to this stream). How did Git know that we would like to hash it as a blob, not a tree? Because we didn’t provide the type, the blob is default 🙂
Now we can examine our database:
find .git/objects -type f #find all blobs #.git/objects/cd/0875583aabe89ee197ea133980a9085d08e497 git cat-file -p cd0875583aabe89ee197ea133980a9085d08e497 #Hello world!
Great! We’ve just added a new blob using Git commands! What next? Check out what will happen now:
git cat-file -p cd0875583aabe89ee197ea133980a9085d08e497 > Hello.txt
That’s right. I’ve just restored the file Hello.txt from the blob! Think about it each time you accidentally deleted a file and Git allowed you to bring it back 🙂
Moreover, watch this:
echo 'Hello world 2' > Hello.txt #change file content git hash-object -w Hello.txt #store the new content in separate blob #warning: LF will be replaced by CRLF in Hello.txt. #The file will have its original line endings in your working directory #b64ccf45c79f05cb2db2d3f3119070279f05064f
Right, now I’m able to freely restore content No.1, or No.2, it depends on which ID I will pass to git cat-file method 🙂
Recall Git concepts…
Notice that I could restore my blobs to any file I wanted. This is because the blobs don’t store filenames, only contents. Trees store them, but as I told you last time, it’s not very easy to crate and place a valid tree file into the database. Fortunately, there is a trick to do this easily 🙂
I’m sure you’re an experienced git-man and you know this image:
What’s interesting here – how folders/files reflects Git status.
- The working directory is just our Project’s directory. In our case – the TestProject folder. This is where we’re working, writing code etc.
- Staging Directory for me was always a virtual place where my work is gathered before I commit it. Now we can see that it’s also reflected – in the .git/index file.
- The repository is our .git/objects folder. We’ve seen this already – all data is stored there. What’s interesting and maybe I will tell you about it in the future – it’s easy to add something there, but not so easy to remove. That’s why Git is so robust and why it’s hard to lose the work definitely.
… and create a staging area
So, what’s the trick? It’s difficult to echo the text that will be hashed in order to make a tree. But it’s very easy to create the tree from the staging area. Problem is that we don’t have it yet created. If we added the file using git add command, the staging area would be created automatically. But we placed our blobs directly into the database (.git/objects).
git update-index to the rescue
Fortunately, we can get our blob and update the index with it. Shall we?
git update-index --add --cacheinfo 100644 cd0875583aabe89ee197ea133980a9085d08e497 Hello.txt
The magic above means:
- git update-index – update index
- –add – adding object
- –cacheinfo – from the database
- 100644 – with a normal file (two more options available – 100755 – executable, and 120000 – symbolic link)
- cd0875… – stored in the blob with this ID
- Hello.txt – as a file with this name
No response occurs, but inside .git folder new index file has just been created 🙂
So, we have the staging area ready now
Now the last step – use git write-tree to reflect the staging area as the new tree object in the database.
git write-tree #72b177c6a7d277b82751ec26e00f356db571f4bd git cat-file -p 72b177c6a7d277b82751ec26e00f356db571f4bd #100644 blob cd0875583aabe89ee197ea133980a9085d08e497 Hello.txt
Voila. We have the tree. We can also examine the index:
git ls-files --stage #100644 cd0875583aabe89ee197ea133980a9085d08e497 0 Hello.txt
It still contains the first version of Hello.txt. So the next step will be to create a new tree with the second version of the Hello.txt AND the first version, but in the subfolder:
git update-index --add --cacheinfo 100644 b64ccf45c79f05cb2db2d3f3119070279f05064f Hello.txt git read-tree --prefix=NestedFolder 72b177c6a7d277b82751ec26e00f356db571f4bd git ls-files --stage #100644 b64ccf45c79f05cb2db2d3f3119070279f05064f 0 Hello.txt #100644 cd0875583aabe89ee197ea133980a9085d08e497 0 NestedFolder/Hello.txt git write-tree #9aa6b0a47fe2d85baffaedd4f33b4db8aa89ad4d git cat-file -p 9aa6b0a47fe2d85baffaedd4f33b4db8aa89ad4d #100644 blob b64ccf45c79f05cb2db2d3f3119070279f05064f Hello.txt #040000 tree 72b177c6a7d277b82751ec26e00f356db571f4bd NestedFolder
We put version 2 of the Hello.txt into the index, then read the first tree into the index, and then reflected the current index state into the new tree. Simple, right?
And now we can commit
It won’t surprise you that commit can be created in a similar way, using low-level git commands. We will create two commits using the tree we’ve just created.
git commit-tree 72b177c6a7d277b82751ec26e00f356db571f4bd -m 'First commit' #7609f812df036cbec928a2c0c4636a74bb14b17b git commit-tree 9aa6b0a47fe2d85baffaedd4f33b4db8aa89ad4d -m 'Second commit' -p 7609f #4ffb709b2bb690c026adbcdd6ccc47bfb98b6678
Two commits are ready. Let’s check the history then:
git log #fatal: your current branch 'master' does not have any commits yet git log 413cd #provide starting point
On the one hand, we have two commits, we can see the history… But on the other, git log without additional parameter (starting point ID) complains that master doesn’t have any commits yet. How is it? To check, I run the command git prune that removes all unreachable objects from the object database. And guess what…
git prune git log 413cd #fatal: ambiguous argument '413cd': unknown revision or path not in the working tree.
There is no such commit. I wanted to find what happened so I created a small PowerShell script (hopefully self-explanatory code):
$border = "*************************************" $repoLocation = "C:\Temp\deleteme\deletethis\TestRepo" Set-Location -Path $repoLocation $gitDbLocation = Join-Path -Path $repoLocation -ChildPath ".git" -AdditionalChildPath "objects" $objectsInDb = Get-ChildItem $gitDbLocation -Recurse -File foreach ($obj in $objectsInDb) { $containingFolderName = Split-Path $obj.PSParentPath -Leaf $objectName = $obj.Name $fullHash = $containingFolderName + $objectName Write-Host $border Write-Host "SHA-1: " $fullHash $objectType = git cat-file -t $fullHash Write-Host "Object type: " $objectType git cat-file -p $fullHash Write-Host $border }
It lists all items in .git/objects folder. We can see, that only trees and blobs are left. Commits were removed as loose objects, not attached to anything.
Summary
This time we checked how git commit works internally. It’s just a convenience (porcelain) command that runs all these plumbing, low-level instructions we tested today. In the next chapter, we will recreate the pruned commits using the same technique described here and we will try to figure out how to place them in the master branch for good. See you!