How Git Actually Works (Under the Hood) — blobs, trees, commits

September 5th, 20256 min read#dev #git #version-control #tooling #ca-duh

A gentle tour of Git’s content‑addressed storage: blobs, trees, commits, refs, and the index—so complex commands like rebase, cherry‑pick, and reset feel predictable.

TL;DR

Git is a content‑addressed database. Everything (files, folders, commits, tags) is an object addressed by a hash (SHA‑1 in most repos; SHA‑256 in newer ones).
A commit points to a tree (snapshot of the repo) plus parent commit(s) and metadata (author, message). Trees point to blobs (file contents) and other trees (subfolders).
Branches/HEAD are just pointers (refs) to commits. Moving pointers (fast‑forward, reset, rebase) changes history view without changing content—until you write new commits.
The three areas explain every “weird” Git moment: HEAD (last committed snapshot), index (staging area), working tree (your files).
“Dangerous” commands are predictable when you think in objects: merge creates a commit with 2 parents, rebase copies commits on top of another base, cherry‑pick copies one commit, reset moves pointers and optionally index/working tree.

The mental model (60 seconds)

commit (hash: c3) ──┐
  tree ─────────────┼─▶ blobs/trees (your files at that commit)
  parent: c2        └─▶ metadata (author, message, timestamp)

c1 ──▶ c2 ──▶ c3   (a simple linear history)
^
ref "main" points here (e.g., c3); HEAD points to "main"

A commit is a Merkle node: its hash depends on its tree hash and parent hash(es). Change any file → new tree and new commit hash.
Refs (e.g., refs/heads/main) are tiny text files containing a hash. HEAD is usually a symbolic ref pointing at a branch (ref: refs/heads/main).

The objects: blobs, trees, commits (and tags)

| Type | What it stores | You can peek with | |---|---|---| | blob | Raw file content (no filename) | git cat-file -p <blob> | | tree | Directory entries (name, mode, type, hash) | git ls-tree <tree> | | commit | Pointer to a tree + parent(s) + metadata | git cat-file -p <commit> | | tag | (Annotated tag) message + signature + target object | git cat-file -p <tag> |

Explore your repo

git rev-parse HEAD          # the current commit id
git cat-file -p HEAD        # see commit: tree + parent + message
git ls-tree -r --name-only HEAD   # list files in the commit snapshot

Objects live under .git/objects/ (loose) or in packfiles (.git/objects/pack/*.pack) after GC/clone to save space.

The three areas (a.k.a. why “it didn’t commit!”)

HEAD (last commit)  ↔  INDEX (staged)  ↔  WORKING TREE (files)
      git checkout        git add               edit files

Working tree: your current files.
Index/staging area: what will go into the next commit.
HEAD: what’s in the current commit.

Common operations:

git add: copy changes from working tree → index.
git commit: write an object for the index’s tree, then a commit pointing to it.
git restore --staged path: remove from index (keep file edited).
git restore path: restore file from HEAD to working tree.

What a commit really contains

Example (abridged):

$ git cat-file -p HEAD
tree a9f3e2...
parent 7c1b6d...
author You <[email protected]> 1693920000 +0200
committer You <[email protected]> 1693920000 +0200

Add feature X

Now inspect the tree:

$ git ls-tree a9f3e2
100644 blob 3b18e5  README.md
040000 tree 8c2a1a  src

Trees list modes (file/exec/subtree), types, and names.

Branches & HEAD (moving pointers)

A branch is just a ref pointing to a commit. New commits advance the branch pointer.
Fast‑forward merge: move the pointer forward to the other commit (no new commit).
Merge commit: create a new commit with two parents when histories diverged.

Before merge:
main:  A ── B
feature:       ╲
                C

After merge commit M:
A ── B ── M
        ╲  ╲
         C   (M has parents B and C)

Detached HEAD:

git checkout <commit>   # HEAD now points directly to a commit, not a branch

Create a branch to keep work:

git switch -c experiment

Rebase vs merge vs cherry‑pick (copying commits)

Merge: keeps both histories; one new commit ties them together.
Rebase: copy commits onto a new base (new hashes). Linear history, new commit ids.
Cherry‑pick: copy one commit onto the current branch.

Rebase (concept):
feature:     C1 ─ C2            main: A ─ B ─ D
                │                           ▲
git rebase main │          becomes          │
                ▼                           │
feature:           C1' ─ C2' (same changes, new bases → new hashes)

Because rebase rewrites history, you usually need git push --force-with-lease to update the remote safely.

`git reset` demystified (what moves where)

| Command | Moves branch? | Moves index? | Moves working tree? | Use it for | |---|---|---|---|---| | git reset --soft X | ✅ to X | ❌ | ❌ | “Undo last commit but keep changes staged” | | git reset --mixed X (default) | ✅ | ✅ (to X) | ❌ | “Unstage changes; keep edits” | | git reset --hard X | ✅ | ✅ | ✅ | Discard everything and go to X |

X can be a commit, branch, or HEAD~1.

Merges & conflicts (index stages)

When Git can’t auto‑merge, files enter conflict with index stages:

Stage 1: base version
Stage 2: --ours (current)
Stage 3: --theirs (incoming)

Resolve, then:

git add <file>
git commit   # completes the merge

Plumbing: build a commit by hand (tiny demo)

echo "hello" > hello.txt
git hash-object -w hello.txt             # writes a blob, prints its id
git update-index --add hello.txt         # stage it (index)
tree_id=$(git write-tree)                # write tree from index
parent=$(git rev-parse --verify HEAD 2>/dev/null || true)
echo "first commit" | git commit-tree "$tree_id" ${parent:+-p $parent}
# Output is a commit id; update a branch to point to it:
git update-ref refs/heads/main <that-commit-id>

This is what git add + git commit automate for you.

Where space savings come from

Git stores snapshots, but uses delta compression in packfiles, so repeated content across versions is efficient.
git gc repacks loose objects; git count-objects -vH shows size.
Renames are detected heuristically; identical content → identical blob hash.

Safety nets: reflog & friends

Reflog records where refs/HEAD pointed recently. If you lost a branch by resetting/force‑pushing locally:

git reflog
git checkout -b rescue <old-hash-from-reflog>

git fsck --lost-found can locate orphaned objects.
Many commands accept the @{-1} syntax (previous branch).

Quick mapping: commands → data structure moves

| You run… | Under the hood it… | |---|---| | git add . | updates index entries (staged snapshot) | | git commit | writes a tree from index; writes a commit pointing to that tree (+ parent) | | git merge | computes new tree; writes a merge commit with 2 parents | | git rebase | copies commits, writing new ones on a new base; moves branch ref | | git checkout/switch | moves HEAD (and updates index/working tree to match) | | git tag -a v1 | writes a tag object pointing at a commit + message/signature |

Quick checklist (become unflappable)

[ ] Think snapshots, not diffs: commits point to trees.
[ ] Remember the three areas: HEAD ↔ index ↔ working tree.
[ ] Branches are pointers; moving them is cheap and reversible (reflog).
[ ] Rebase/cherry‑pick copy commits (new ids). Merge adds a tie‑point.
[ ] Use git cat-file, ls-tree, reflog to debug anything scary.

One‑minute hands‑on plan

Run git cat-file -p HEAD and git ls-tree -r --name-only HEAD.
Make a tiny change; inspect git diff (working tree) → git add → git diff --cached (index).
Commit; run git log --graph --oneline --decorate to see pointers move.
Try git switch -c demo, add a commit, then git rebase main to watch hashes change.
Create a conflict on purpose; resolve and complete the merge; inspect .git/MERGE_HEAD and stages.