Git Internals Explained
Explore Git's internal architecture: objects, refs, and the .git directory. Learn how Git stores data and tracks changes under the hood.
ποΈ Objects
Git uses three main types of objects to store and manage your code:
- Commit
- Tree
- Blobs
Red: commit object, Blue: tree object, Grey: blob object
π Blobs (Binary Large Objects)
What are blobs?
- Store the actual contents of files
- Contain full snapshots, not just differences
- Identified by unique SHA-1 hash (20 bytes = 40 hexadecimal characters)
Key characteristics:
- Content only: Unlike regular files that have metadata (creation date, permissions), blobs store only raw file content
- Immutable: Once created, a blobβs contents cannot be changed. Any modification creates a new blob with a different hash
π³ Trees
What are trees?
- Represent filesystem structure or directory listings
- Reference other trees (subdirectories) or blobs (files) by their hashes
- Each tree has its own unique SHA-1 hash
How they work:
- Trees can contain other trees, representing nested directories
- They maintain the structure and organization of your project
π Commits
What are commits?
- Represent a complete snapshot of your repository at a specific point in time
- Combine metadata with a pointer to the root tree
Commit contents:
- Committer information: Author details
- Timestamp: When the commit was created
- Commit message: Description of changes
- Parent pointers: References to previous commits (merge commits have multiple parents)
- Tree reference: Points to the root tree object
Important notes:
- Commits store entire snapshots, not just diffs from previous commits
- Identified by SHA-1 hash (same as shown in
git log
)
How changes propagate:
- Updating a file creates a new blob with different hash
- This changes the tree hash that contains the file
- Which changes the commit hash that references the tree
Efficient storage:
- Only modified files get new blobs
- Unchanged files are referenced, not duplicated
- New commits reference their parent commits
Hash uniqueness: Two different people creating identical files will have the same blob and tree hashes, but different commit hashes due to different author information and timestamps.
πΏ Branches
What are branches?
- Named references to specific commits
- Lightweight pointers that move as you create new commits
How branches work:
- HEAD defines your currently active branch
git checkout
moves HEAD pointer to that branch- Creating commits on non-master branches updates that branchβs pointer
π Changes and Workflow
Repository structure:
- Repository: Collection of commits
- Working directory: Your
.git
folder plus all project files - Staging area (index): Where changes are prepared before committing
File states:
- Tracked: Files present in previous commit or added to staging area
- Untracked: New files Git doesnβt know about yet
Changes are registered in the index
(staging area) using git add
.
π .git Directory Structure
The .git
directory contains everything Git needs:
1
2
3
4
5
6
7
8
9
10
.git/
βββ HEAD (file)
βββ index (file)
βββ objects/
β βββ 11/
β β βββ 8f108d76b16a058db9fcb385a5fd640b54e47a
β βββ [other hash folders...]
βββ refs/
βββ heads/
βββ master (file)
Directory components:
objects/
: Stores all Git objects (blobs, trees, commits)- Subdivided by first two characters of hash for efficiency
refs/
: Directory for referencesheads/
: Contains branch files with commit hashes they point tomaster
: File containing hash of latest commit on master branch
HEAD
: Points to current active branch- Contains content like
ref: refs/heads/master
- Contains content like
index
: Represents the staging area
π οΈ Git Commands
Basic Object Inspection
1
2
3
4
5
# Get the type of object from hash
git cat-file -t
# Get the content of object from hash
git cat-file -p
Working with Hashes
Generate and store hashes:
1
2
3
4
5
# Get hash of string
echo "git is awesome" | git hash-object --stdin
# Get hash and store as object in Git database
echo "git is awesome" | git hash-object --stdin -w
This creates a blob object stored as:
1
2
3
objects/
βββ 11/
βββ 8f108d76b16a058db9fcb385a5fd640b54e47a
Retrieve object information:
1
2
3
4
5
6
7
8
# Get file type of hash
git cat-file -t
# Get content of hash
git cat-file -p
# Save hash content to file
git cat-file -p > hello.txt
Note: A new blob is created when you add something to staging area using git add
.
Staging Operations
1
2
# Manually add blob to staging area
git update-index --add --cacheinfo 100644
This creates the index
file.
Committing Process
1
2
# Create tree from current working directory
git write-tree
This returns the hash of the root tree, stored in the objects folder.
Inspect the tree:
1
2
3
# Check tree type and content
git cat-file -t
git cat-file -p
Create commit:
1
2
# Commit the tree
git commit-tree -m "commit message" -p
Managing HEAD and Branches
Update branch pointer:
1
2
# Point master to latest commit
echo > .git/refs/heads/master
Branch operations:
- Create branch: Add file in
.git/refs/heads/
containing commit hash - Switch branch: Change HEAD file content to
ref: refs/heads/
ποΈ Compression
Git optimizes storage using zlib compression:
- Combines LZ77 and Huffman coding algorithms
- Significantly reduces repository size
- Maintains data integrity while saving space
π References
Understanding Gitβs internal architecture helps you work more effectively with version control and troubleshoot issues when they arise.