git technical handbook

october 10, 2024

introduction

this guide is my attempt to construct a functional handbook for using git and learning a bit about how it works. to write this, I’ve extracted some tips I found helpful from the first version of the Pro Git book (licensed under CC by-nc-sa 3.0), and relied on my own experiences to furnish that framework. I mostly wrote this to formalize a reference for myself, but I hope it can be helpful for others interested to use git - beginners and experts alike!

motivation

version control is something of a technical public good, but these days most of us interface with version control when using cloud-based products such as google docs - or any other product that lets you undo or jump to a previous version of a file. those tools provide these conveniences at the cost of their users’ privacy (or money). we all deserve to undo in private, and for free.

using the commandline

this guide requires readers to run commands in their computer’s commandline interface and have some familiarity with navigating directories that way. if you haven’t done this before, the main commands you need to know to get started are ls (or dir in the default windows commandline interface) and cd:

ls               # list every non-hidden file your current directory
cd exampleFolder # change your current directory to one of its already-existing
                 # subdirectories named "exampleFolder"
cd ..            # change your current directory to its parent directory

see the “installing git” section for some tips on how to get the git command working on your commandline interface.

note: anything after # is ignored by the computer on the commandline and is called a “comment”, and I may write some inline notes this way throughout the guide.

what is version control

“version control” refers to the idea of recording changes to a file or a set of files, so that you can recall specific versions later. a “version control system” (VCS) is a tool that implements version control. any product you’re familiar with that implements “undo”, “redo”, or lets you select a version of your work from a history of versions certainly uses a VCS of some kind.

git is a VCS that is not attached to any specific product, but is instead a commandline tool that can introduce version control to any directory and its contents. there are many commandline VCSs, but I’m personally most familiar with git, and there are a couple reasons why I continue to use it.

why git

git is pretty cool because it’s…

free and open-source!
distributed: anyone who clones a git repository will have a copy of the whole repository (and not just the most recent history), which enables you to easily work on the same directory across multiple machines or collaborate with others.
check-summed: everything in git is check-summed before persisted to storage, and those checksum hashes are used to identify everything in git, which makes git inherently robust in detecting file corruption.
fast and local: only operations that specifically interact with remote repositories rely on network transfer, which makes most operations in git super speedy.

git technical guide

this section introduces the taxonomy of git, as well as some key commands as they pertain to each section.

git repositories

a git repository is a directory that is tracked by git for the purposes of version control. before you can start recording changes to files, you must first create or clone a repository on your computer, or “locally”.

I like to use one repository per project - for example, I track all the files for my website in one repository.

if you want to track an existing local directory with git, you can do something like the following inside that directory:

git init

alternatively, if you want to start working on a git repository that exists somewhere not on your computer, or “remotely”, you can clone that on your computer. for example, let’s say you want to work on a copy of a decompiled mario 64 project that is available for anyone to clone online. you can do that by running the following command, and optionally renaming that project “mymario64” locally:

git clone https://github.com/n64decomp/sm64.git mymario64

once you’ve made or cloned a git repository locally, you can start to record versions of this directory to the local history of the repository.

configure git

one more thing you could do before you start committing is configure git to know who you are so your commit metadata can include your identity when you record new versions. you can configure git at the following levels:

a single repository (inside the directory, call git config)
all repositories for your computer user (git config --global)
all repositories for all users (git config --system)

for example, you can specify your identity in the following way:

git config user.name "mario"
git config user.email "mario@marioparty.blog"

note: no one will be verifying what you put here - this is just for yourself and your collaborators.

you can also specify certain preferences in your git config file, like what text editor you’d like to use for writing commit messages. or if you notice you’re using certain git commands a lot, you can also write shortcuts for those commands in your git config file. see git config --help for all configurable options.

to see all the settings pertinent to your repository, run git config --list, which compiles all settings from all config files.

git commits

each recorded version is effectively a snapshot of the whole directory. those versions are called “commits”. each commit has a reference to its parent commit, or the version directly preceding it. a git commit remains very fast despite snapshotting the whole directory because if a file has not changed since a previous version, git does not store that file again, and instead references the unchanged file hash.

git commit structure

a commit stores a snapshot of the directory in the following way:

git-commit-structure

in addition to metadata about the commit such as author, message, and date, this first commit also points to a version of the root project directory tree. the tree version points to the file versions, stored as “blobs”. each object (commit, tree, file blob) is identified by its checksum hash. finally, each commit also points to its parent commit, or the commit directly preceding it in the git history.

we refer to the chain of git commits, each pointing to the preceding commit, as the “git history” of the project directory.

committing a change

the flow of using git to track versions of your project looks something like this:

git-local-operations

[optional] load your working directory by checking out a branch of the repository
modify files (i.e. edit a text file in the directory with your favorite text editor)
stage them for commit
finally, commit them to the repository so that version of the directory is recorded

step 1 is only important if you’re working with multiple branches (more on branches later). steps 3 and 4 look something like the following:

git add *.txt                         # stage all files in the root repository directory
                                      # ending in .txt extension to for commit
git add README.md                     # stage README.md in the root repository directory
git commit -m "updated documentation" # commit the version of the project, specifying
                                      # the commit message in the same line

determining file status

by default, files added to the git repository are not tracked by git until they are staged. files in the git repository can be in any of the following states:

untracked: file is not tracked by git and therefore not versioned
unmodified: file is tracked by git and has not been changed since the last time the file was committed
modified: file is tracked by git and has been changed since the last time the file was committed
staged: file is tracked by git, has been changed since the last time the file was committed, and has been added to the staging area to be included in the next commit

git-file-status-lifecycle

to determine which files are in which state, you can run the following command:

git status

for details on the changes to “modified” files since the last time it was staged or committed, you can run the following command:

git diff

for details on the changes to “staged” files since the last commit, you can run the same command with the following option:

git diff --staged

note: it’s possible that the same file can be both “modified” and “staged”. this can result from doing the following:

modifying a file in the working directory
staging the file for commit
modifying the same file again

in this case, the result is there is a version of the file that has been staged and will be included in the next commit, as well as a version which has not been staged yet. the changes the working directory can only be committed once they’ve been added to the staging area.

viewing git history

use the command git log to view a simple list of commits and basic metadata (author, date, checksum hash, and message) in reverse chronological order for your git repository.

typically that’s all you’ll ever need, but there are a handful of helpful arguments you can pass in to that command to make viewing git history easier or more fun. I’ll list some below - generally you can mix and match the arguments here, and you can check out git log --help for more.

git log -p -3                                # show the changes from the last 3 commits
git log --pretty=format:"%h - %an, %ar : %s" # specify your own format - in this case,
                                             # format string placeholders will be
                                             # replaced with the following: abbreviated
                                             # commit hash, author name, author date
                                             # (relative), commit message. check out
                                             # docs for more options
git log --author "mario"                     # show commits from a specific author

remember: if there’s a command with complicated arguments that you like, you can always add a shortcut for it in your git config file.

undoing things in git

revert a commit

if you want to undo an entire commit (add a new commit to the git history which negates the work done in a given commit), you can run the following command with the commit hash you want to undo:

git revert 09d60

edit a commit

if you’re quite fastidious about your commits and want to edit your last commit to either overwrite the commit’s message or include the changes in your staging area, you can run one of the following commands:

git commit --amend --no-edit               # replaces last commit to also include staged
                                           # changes
git commit --amend                         # same as above, but allows you to edit the
                                           # commit message as well
git commit --amend -m "new commit message" # same as above, but overwrites the commit
                                           # message inline

note: when you amend a commit, the commit checksum hash will change in your git history.

unstage a change prior to a commit

if you want to simply move a file from the staging area back to the working directory so those changes aren’t included in your next commit, you can run the following command:

git reset HEAD <filename>

revert a specific file back to a previous commit

if you want to reset a file to a previous version in the working directory, you can do so by running the following command:

git checkout <commit hash or branch name> -- <filename>

git remotes

one major draw of git is that it can be easily distributed. it’s easy to keep copies of the same repository in multiple locations, even multiple computers. there are a couple reasons why you might be interested to distribute your repository:

redundancy: if something were to happen to your computer, the entire history of your project (except any history you haven’t copied over from your computer yet) will still be available somewhere else. your work and git history will not be lost.
collaboration: all collaborators contributing to the same repository can work on their own local copy of the repository, while also easily updating their copy of the repository with other collaborators’ work.

you can easily update remote repositories with the work from your local repository (or vice versa) by configuring git on your repository to store the “remote” locations of where other copies of the repository live.

show remotes

to see the remote locations you currently have configured, you can run one of the following commands:

git remote     # show remote short codes
git remote -v  # show remote short codes and the URLs they expand to, along with
               # network transfer protocol used

if your repository was “cloned”, you will see at least the short code for that remote repository listed, which is by default called origin.

add a remote

you can add a remote URL (in this case using the “git” network transfer protocol - check out the protocols section for more details) and short code with the following command:

git remote add team0 git://git.team0.com

fetch or pull from remote

from then on, you can update your local repository with the remote repository, including any new branches or tags, with the following:

git fetch team0

after fetching a remote repository, you will typically want to merge work from one branch from the updated remote repository into a branch in your local repository.

alternatively, if your current branch “tracks” a remote branch (see the “branches” section for more details), you can also fetch and merge into your current branch by running the following command:

git pull

push to remote

if a remote repository is configured to allow you to write to it, you can also update that remote repository to include your local git history by pushing your current branch to a branch on the remote repository. for example, if you’re pushing to the main branch on team0, you can run the following command:

git push team0 main

or if your current branch is already tracking a remote branch, you can leave out the remote shortcode and branch name and just run git push.

inspect remotes

to inspect your remote branches and see whether they’re currently being tracked by a local branch, you can run one of the following commands:

git remote show
git remote show team0   # specify remote shortcode to only show information for that remote

rename or remove remotes

if you would like to rename your remote shortcode, you can do so with the following command:

git remote rename team0 team

lastly, if you would like to remove a reference to a remote repository, you can do so with the following:

git remote rm team

git branches

a branch is a simple file containing a reference to the last commit added to the git history at the time the branch was created, or while that branch was the “current branch”.

let’s say you perform the following actions in git:

create a new branch dialogue
switch from main to that branch, in other words make the dialogue branch your “current branch”
add a commit to the git history

that new dialogue branch is simply a reference to that new commit added in step 3. because that commit points to its parent commit, the new commit and branch is still connected to the full git history of the repository. when you switch back to the main branch, however, the last commit shown in git log will be the parent of that new commit.

because branches are effectively simple files or references to the last added commit while that branch was made the “current branch”, they’re really fast and easy to create and destroy, and thus easy to integrate into your workflow.

using branches

let’s say we’re happily committing code to the main branch, and we want to experiment with writing dialogue for our project - for instance, coming up with a dialogue system in a video game. while our work on the dialogue is in-progress, we may want to keep that work separate from the main branch for a variety of reasons.

you can create a new git branch called dialogue with the following command:

git branch dialogue

this creates a new branch that references the last commit, which we can think of as looking like the following:

git-simple-branch0

notice that HEAD points to the main branch. HEAD is another simple file that git uses to track what the “current branch” is. so let’s say you want to do the following:

add a commit to main (i.e. update the README.md file describing your project)
switch to your new dialogue branch to commit a change with your initial dialogue script

to accomplish this in git, you would do something like the following:

git add README.md
git commit -m "add README to describe the project"
git checkout dialogue
git add dialogue/script-draft-1.txt
git commit -m "add first draft of dialogue script"

after those commands, we’ve added a new commit on each branch, but the dialogue branch is not aware of the commit containing the README.md changes, and similarly the main branch is not aware of the first draft of the dialogue script. HEAD is also now pointing to the dialogue branch, indicating that that is your “current branch”, and new commits will be added to that branch of git history.

git-simple-branch1

branch management

you can use branches in whichever way makes the most sense for your workflow. two common ways to use branches are listed below:

topic branches: short-lived branches created and used for a cohesive and relatively small piece of work, for example dialogue from the previous section. you may want to use topic branches to facilitate getting feedback from collaborators before integrating your work into the main branch, or to maintain your ability to undo all your work pertaining to a cohesive concern.
long-running branches: branches that track long-living versions of your work, for example main. you could also keep more long-living branches for tracking different stages of drafts. for example, you could keep a branch called “polished” which you periodically update with the work from main at points where you feel ready to receive external feedback, and showcase your progress using that branch even while you continue experiment on your main branch. another case where you might want to use long-living branches is tracking localizations of your work - spanish translation, english, german… etc.

below are a few commands that are helpful for inspecting and managing your branches locally.

git branch                 # list branches. * indicates `HEAD`
git branch -v              # list branches with last commit
git branch dialogue        # create `dialogue` branch
git checkout dialogue      # switch to `dialogue` branch
git checkout -b dialogue   # create and switch to `dialogue` branch
git branch -d dialogue     # delete `dialogue` branch

moving commits from one branch to another

git merge

once you’ve found a good solution for your dialogue, you may want to integrate that work back into the main branch. you can do so in one of two ways - a “merge” or a “rebase”. you can “merge” your work into main by running the following commands:

git checkout main
git merge dialogue
git merge -d dialogue  # optional: delete unused topic branch

in this case, git will add a new commit to main which includes the changes made on dialogue since the two branches’ common ancestor. the new commit will have two parents: the last commit on main and the last commit on dialogue. we can visualize this to look something like the following:

git-merge

fast-forward merge

if commits were only added to dialogue and not to main (so C4 in the above diagram was never added and the git history effectively had no branches), then the merge actually does not require an additional commit on main. in that case, dialogue would be directly upstream of main, and so main can just “fast-forward” to the commit that dialogue is referring to by updating its commit reference to that commit (“C5” or “b3d90”, from the diagram).

git rebase

the other way to integrate your work back into the main branch is to perform a “rebase” before performing a “merge”.

let’s say we have the same branches - main and dialogue, where each branch has unique commits. we can perform a rebase by performing the following commits:

git checkout dialogue
git rebase main

rebasing dialogue onto main will effectively replay all the commits that are only on dialogue on top of the main commits. we can visualize this as looking like the following:

git-simple-rebase

unlike a “merge”, a “rebase” will replace commits that were formerly unique to the dialogue branch with equivalent commits in the main line of git history. with dialogue being directly upstream of main, we can now perform the “merge” without any additional commits because main can just fast-forward to dialogue, keeping the git history in main very clean and easy to understand.

git checkout main
git merge dialogue
git branch -d dialogue    # optional: delete unused topic branch

rebasing just some commits

one cool thing about “rebase” is you don’t have to incorporate the full branch from its inception into main. you can also specify which commit git should start applying commits from that branch. for example, let’s say we create a new branch off of dialogue called fix, which fixes an issue not only in the dialogue branch but also in the main branch. let’s say you also decide you want to have the fix applied to the main branch before you’re ready to merge dialogue in. in that case, you can run something like the following:

git rebase --onto main dialogue fix
git checkout main
git merge fix
git -d fix                             # optional: delete unused topic branch

we are able to apply the commits from fix starting from the common ancestor that branch has with the dialogue branch.

git-complex-rebase

squashing commits

you can reorganize or edit commits using the interactive option on the rebase command. for example, you can squash a bunch of commits into a single, new commit.

git checkout dialogue
git rebase -i main

the above commands start an interactive rebase, which will open in your text editor a file including all the commits in dialogue’s git history since it branched off from main, with instructions in a comment on how to edit those. that might look something like the following:

pick b3d90 add first draft of dialogue script
pick f5e79 fix typo

# Rebase dialogue onto main
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit

to squash the commits into one, make the following edits. once you save and exit, the two commits will be replaced by a single commit with the changes from both.

pick b3d90 add first draft of dialogue script
squash f5e79 fix typo

# Rebase dialogue onto main
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit

rebase warning

if you can, avoid “rebase” strategies on commits you have already pushed to a remote branch other collaborators are working on. as mentioned, “rebasing” is effectively replacing commits on one branch with commits on another branch, meaning the checksum hash identifier will be different. therefore, if your collaborators pushing and fetching the same branch already had the old version of the commits in their local repository’s git history, their next fetch will fail because the remote branch’s history conflicts with what’s in their local history, and they may need to do a little extra work to get out of the situation.

luckily, when pushing rewritten git history to a remote repository, git disallows the operation and informs you that you must specify --force as an argument to proceed. this may be a situation where you pause to think about whether you really want to overwrite remote history. if it’s a remote branch only you know about and you trust that you did the rebase correctly, let it rip.

cherry-pick

to apply the changes introduced by specific commits from other branches, use the cherry-pick command. in the “git rebase” section, we rebase the dialogue branch onto main, and then merge dialogue into main. you could feasibly achieve something similar by cherry-picking the commit(s) from dialogue onto main:

git checkout main
git cherry-pick b3d90

the above commands create a new commit on main with the changes in commit b3d90, which was previously only on the dialogue branch. like rebasing, this produces a new commit hash. unlike rebasing, merging dialogue into main later would result in a merge conflict, since both branches have their own version of the same commit(s).

merge conflicts

when merging a commit or branch of commits into a branch, you may receive an error that looks something like the following:

CONFLICT... automatic merge failed; fix conflicts and then commit the result."

this means that git had trouble merging one or more commits, for example because the same line in a file was changed by both branches. in this case, you must open the “unmerged” files and edit them to no longer have conflicts. the conflicts will be annotated with something like the following:

<<<<<< HEAD: README.md
project contributors include: mario, luigi
=======
project contributors include: mario, peach
>>>>>> fix: README.md

in this case, both branches added a different contributor to the README.md project description. the “current branch” or HEAD that you’re merging work into saw “luigi” added, and the fix branch you’re trying to merge in saw “peach” added. you can edit these lines to either only include one or the other of the changes, or you can replace them with a combination of both - whatever makes sense for what you want to have in the main branch.

in this case, we want to list both peach and luigi as contributors since they both joined the project, so we can edit the above to look something like this:

project contributors include: mario, peach, luigi

once you’ve resolved the conflicts in the file and saved your changes in the working directory, add your files to the staging area and continue the merge, rebase, or cherry-pick by committing your change:

git add README.md
git commit

remote branches

remote branches are automatically updated on network communication with the remote repository, for example via a “fetch” or “push”. remote branches are effectively bookmarks for where branches on the remote repository were the last time you connected over the network. we’ve already gone over some helpful remote operations, but now that we’ve gone over branches, below are some helpful remote branch management commands:

git push origin fix                   # push local branch `fix` to remote repo "origin"
git push origin fix:colorFix          # same as above, but call remote branch "colorFix"
git checkout --track origin/colorFix  # checkout new branch that tracks remote branch with same name
git checkout -b cFix origin/colorFix  # same as above, but call the local tracking branch "cFix"
git push origin :colorFix             # delete remote branch called "colorFix"

undoing git mistakes

in addition to recording the history of the repository, git also records the history of git references such as tips of branches in a reference log, or “reflog”. this means you can effectively undo git commands, not just undo versions of the project. if you accidentally remove a commit from your git history, for example, you can get your repository back to the state it was in before that mistake by checking out the reference prior to that action. you could also inspect what a branch looked like, say, one week ago. it’s pretty powerful stuff and although I don’t use it often, knowing the reflog exists makes me feel less nervous about rewriting my git history (rebasing branches, amending or squashing commits, etc).

for example, let’s say you run the following somewhat dangerous command (don’t try this at home, unless you’re sure everything in your working directory is committed):

git reset --hard HEAD^

this effectively changes the current branch tip to point to the commit previous to HEAD (that’s what the ^ annotation means), and effectively removes the last commit from your git history. if you realize that you actually didn’t want to do that, you can look into your reflog and undo that change to the tip of the branch by running git reflog, which may output something like the following:

1b74d4e... HEAD@{0}: reset --hard HEAD^: updating HEAD
a6e8234... HEAD@{1}: commit: fix

you can see HEAD 0 moves ago is resulting from our reset --hard command, and HEAD 1 move ago is resulting from a commit, which we just removed from our git history. we can get that commit back in our git history by running the following command:

git reset --hard a6e8234

note: the reflog only applies to references that git is aware of. unfortunately, if you’ve lost changes that were in your working directory that have not yet been committed or staged, that would not be recoverable by git. git also only keeps entries in the reflog for about a month, so any branch reference changes older than that would also not be recoverable.

using git across multiple computers

we’ve talked about remote repositories. below I’ll include some information on the different protocols that git supports for remote operations, which may be helpful context when cloning an existing remote repository onto your own computer.

network protocols

let’s briefly talk about the protocols git supports for remote operations. knowing a little bit about network protocols may help inform how you interact with remote branches, but only becomes important when configuring your own remote git repository.

local

all collaborators have a shared mounted file system. if you all have access to the same directories locally, you can clone a repository in the following way:

git clone /opt/git/mario64.git         # use hard links, copy files as needed

or if you’d prefer a process more similar to a clone over the network (which would not make any assumptions about what your local directory may already have access to), you can clone the same repository in the following way:

git clone file:///opt/git/mario64.git  # similar to network transfer

after cloning, you can add the original repository as a remote reference as usual:

git remote add mario64_local /opt/git/project.git

SSH

a remote server has SSH set up, and collaborators perform remote git operations on that server over SSH. SSH is historically one of the easiest network protocol to set up for write access (i.e. “git push” permissions) in addition to read access (i.e. “git fetch” permissions), so it’s probably the most common over-the-network protocol to use. cloning a repository typically looks like the following, where “user” is an SSH user on the specified server:

git clone ssh://user@marioparty.blog:git/mario64.git

git

a remote server is running a special process packaged with git, which allows for unauthenticated network traffic on port 9418. since this protocol does not support authentication, it’s typically used for read access only, especially if the remote server is open to the public internet on that port. cloning a repository would look like the following:

git clone git://marioparty.blog:git/mario64.git

HTTP/S

a remote server accepts HTTP/S traffic on standard HTTPS ports and allows for HTTP authentication mechanisms if supported by the server’s version of git, or simply serves the git repository files like normal files from the web browser in a read-only manner. cloning a repository would look like the following:

git clone https://github.com/n64decomp/sm64.git mymario64

apparently as of git 1.6.6, a new HTTP protocol was introduced that made writing to a remote repository easier. I wouldn’t know anything about that, but sounds cool!

installing git

C:\Users\fiona>git
'git' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\fiona>

if git is not already installed and on your path (i.e. running git commands on your commandline interface like terminal throws an error “command not found”, or similar), you can download the git binary specific for your operating system at git-scm, or use an operating system package manager of your choice to download git.

there are all sorts of opinions about the best way to install git using some package manager. as someone who used to develop mostly on mac (which has a very Linux-like commandline interface) and now develops on windows 10, I have been using git bash, which also comes with - you guessed it - a Linux-like commandline interface. I hear the default windows commandline on windows 11+ is more Linux-like, so maybe I would have chosen to download the git binary directly if I were using that operating system? on mac, I believe a version of git comes with XCode “command line tools”, which you can download from the app store.

that’s all folks

if you have any thoughts or feedback on this breakdown of git, I’d love to hear from you over email. also, despite the length of this post, I ended up cutting out the following topics for the sake of brevity:

git tags
setting up your own git server
writing articles and books using git (it’s not just useful for programming!)
an opinionated guide on how to collaborate remotely with git
more on linux/unix operating system commandline

if you’d be interested to hear more about any of those, I’d love to know that to. until next time~

tags:

technical