Git basics#
This is from a lunch-n-learn I hosted. Yet another intro to git is not necessary on the internet, but I wanted to store mine someplace easily accessible for when I need it again. This introduction is geared towards Software Engineers and touches on some of the inner workings/concepts git employs.
Intro#
Distributed Version Control System (VCS)
The dominant source code management tool
Git != Github
Git is open source
Github is an implementation of a git server. It hosts git repositories and a bunch of extra features on top
git has a higher learning curve (than other VCS’), but in exchange gives you a lot more control
There are many gui applications for git, but its valuable to understand what these apps are doing (Command Line Interface)
Distributed?#
Centralized (ie: Subversion/SVN) vs distributed (Git)
data:image/s3,"s3://crabby-images/6f29b/6f29be3ca57c2123ca65c7547f5ff981f7944881" alt="https://k21academy.com/wp-content/uploads/2020/07/dvcs.png"
History#
Originally written by Linus Torvalds, creator of linux. Starting in 2005
built to address the problems with Version Control Systems that affected linux development the most
fast merging
strong safeguards against corruption
distributed workflow
Git describes itself as “the stupid content tracker”
Getting started#
Create a new repo
git init
Download an existing repo for the first time:
git clone
The 3 States of a file#
data:image/s3,"s3://crabby-images/9a066/9a0663e17e929e94c4930b790bc56b6813a34bf7" alt="https://jwiegley.github.io/git-from-the-bottom-up/images/lifecycle.png"
Viewing what stage things are in:
git status
Note
git status
is one of gits most useful commands! It provides a lot of information on the current state of your repository
Working tree / working directory#
Where all changes live at first. The directory you see on your machine. 2 types of files in the working directory:
Tracked files: files git knows about. These are files that were in a prior commit
Untracked files: any files in the working directory that have never been committed and are not staged
Staging area#
Also called the index
a spot to register changes from the working directory in preparation for a commit. a middle ground thats unique to git
The version of a file in the staging area is separate from the same file in your working directory
Taking a file from Working Dir → staging: git add <file>
Easiest to do in a visual editor when the your making a more complex index/commit (Example: VsCode)
Commits in the repository#
Commits are snapshots of the entire repository
changes in the staging area turn into a commit with
git commit
shortcut:
git commit -m "commit message..."
if you dont need a full editor for the commit message
commits build off one another to create the repository’s “history”
Helpful way to view history:
git log
Commits as a visual graph:
git log --oneline --graph
How commits work#
Under the hood git has 3 fundamental objects:
blob: “Binary Large Object”. Just the contents of a file
tree: a directory listing may contain blobs or more trees (imagine
ls
as an object)commit: snapshot of the working directory
(There are also tag and packfile objects, but not important for this discussion)
data:image/s3,"s3://crabby-images/b28bf/b28bf9441c775c0316fc2e38dff288d7e4ca3a53" alt="https://www.freecodecamp.org/news/content/images/2020/12/image-37.png"
As a directory:
DOCS/
PIC.PNG
1.TXT
TEST.JS
Adding another commit#
data:image/s3,"s3://crabby-images/6a1dc/6a1dc407071d4d676cabbbe16af340b6fc76d943" alt="https://www.freecodecamp.org/news/content/images/2020/12/image-41.png"
Each commit is a full-snapshot of the repository, not just a diff. Changed BLOBS are added to the commit, Unchanged blobs/trees reference prior commits
Aside (not important for typical git usage)
You can prove the above model to yourself by using git cat-file
within a repo
Start with
git cat-file -p <COMMIT_SHA>
commits point to a
tree: <SHA>
git cat-file -p <TREE_SHA>
can display what a tree object looks likefrom there you can get more tree or blob SHAs
cat-file -p <BLOB_SHA>
just prints out the file itself
SHAs#
SHA = Secure Hashing Algorithm
40-character hexadecimal SHA256 is used to to identify every object in a git repository quickly
these are the little tags of random numbers/letters in the above images (ie:
62E7A
)(in the images they are abbreviated from 40 characters for simplicity)
Concept of hashing algorithms:
for any amount of input data, generate a unique fixed-length hash (code of 40 characters used by git)
repeatable. when fed the same data, you get the same hash
unique: 2 different inputs should not generate the same hash
if your input data changes by only 1 bit, the resulting hash will be different
See https://en.wikipedia.org/wiki/Cryptographic_hash_function
Remotes#
Remotes are gits way of pointing to non-local repository locations (like bitbucket or github for instance)
When your local changes are to be on the git server use git push
- When you want to get everyone’s latest changes from the server: git pull
this fetch es changes from the remote repository and merges them into your local copy
you can get remote changes without merging them into your local copy with git fetch
Managing remotes#
In your local copy of a repository, you have control over what remotes your code is sent to/retrieved from
The default remote is
origin
When first setting up a repository with
git clone
,origin
will be set to the URL you’re cloning from
list configured remotes
git remote -v
Add a new remote:
git remote add <name> <git_url>
Update the URL of a remote:
git remote set-url <name> <new_git_url>
All push/pull commands can be directed at specific remotes
ex:
git push origin
: push my local code on my current branch toorigin
Another common remote name:
upstream
Used when ‘forking’ another repository on github
ex: len0rd/pyftdi
Branching#
fundamental to collaborative work in a repository
Branches are just nice pointers to commit SHAs
Under the hood: branches live in
.git/refs/
. They’re just files with a the commit SHA they currently point to
data:image/s3,"s3://crabby-images/512fa/512fa15833a56b8c4436b4cc0b647c803d7e1f47" alt="https://www.freecodecamp.org/news/content/images/2020/12/image-42.png"
Branch Commands#
Create new branch
git branch
select branch for working directory:
git checkout
shortcut:
git checkout -b <new_branch_name>
what branch am i on?
git status
HEAD
: special git pointer that points to the branch currently checked out in your local copy
data:image/s3,"s3://crabby-images/93e4a/93e4afab9ff50cbf5b0c7796828957a051f6956a" alt="https://www.freecodecamp.org/news/content/images/2020/12/image-43.png"
Adding a commit to a branch. Branches automatically update to the latest commit applied to them
Merging#
data:image/s3,"s3://crabby-images/c2ac9/c2ac9de8e8e4f81586b00e0b77f33a72f99b6a50" alt="../../_images/git_merging.png"
Most common merge method: 3-way merge#
pictured above
The three commits used in this merge:
Head of destination
Head of branch to be merged
the common ancestor (allows git to know the exact diffs its combining)
The merge operation itself is usually facilitated by an online tool, but you have to address merge conflicts manually
Start a merge:
git merge <SOURCE_BRANCH>
Note! All merge commands merge into the branch you are currently on
git checkout develop
# this will merge feature/a into develop
git merge feature/a
There are many other merge types/strategies that will be discussed later
Dealing with conflicts#
Conflicts can occur for a few reasons (when merging A and B):
file was deleted on A but edited on B
same portion of a file was edited on A and B
when there’s a conflict, git will halt the merge, mark the conflicts, and tell you to resolve them
Conflict resolution too hard? you can always back out:
git merge --abort
What a conflict looks like
hello
<<<<<<< HEAD
there! i want to add stuff here
=======
there.
another
line!
>>>>>>> feature/tmiller/more_stuff
To resolve you need to decide if you want the current branches version
HEAD
/ “ours”, the branch being merged version (<BRANCH_NAME>
/ “theirs”) or some combination of both.Regardless, you should always remove the marker lines (
<<<<<
,>>>>
, etc) before finishing the mergeGUIs/Editors can be helpful here (VSCode)
To finish the merge:
add conflicted (but now resolved) files with
git add
commit with
git commit
(git will autofill the message)
Stashing#
Temporarily save working-tree changes so you can come back to them later
Useful when you need to switch branches but your current work isnt ready to commit yet
(Mercurial has a plugin called
shelve
that works in a similar way)
Stashes are stored in a stack-like structure
stack-like because it works like a stack by default, but you can operate on things in the middle of the stack without issue
Working with stashes#
Current stashes:
git stash list
See what’s in a stash:
git stash show [-p] [STASH_NAME]
Apply a stash:
git stash apply [STASH_NAME]
Drop a stash from the stack:
git stash drop [STASH_NAME]
Apply the latest stash and drop from the stack:
git stash pop
How stashes work#
Internally each stash is just a set of commit objects stored in the .git/
folder
Aliases - make git more convenient#
I use bash aliases to make using git from a terminal faster/easier. These can be added to your ~/.bashrc
for more convenient git usage. Here are some that you may find useful.
alias gs='git status'
alias ga='git add'
alias gc='git commit'
alias gpush='git push'
alias gpull='git pull'
# update git submodules to the correct commit
alias gsub='git submodule update --init --recursive'
# prune away local copies of branches that have been merged on remote
alias gprune='git fetch -p && git branch --merged | grep -i -v -E "master|develop|main|dev|development" | xargs git branch -d'
alias gdiff='git diff'
# git log with one line per commit and a nice graph
alias glog="git log --oneline --decorate --graph --branches --remotes --tags"
Git’s also has an alias system. Aliases are stored in Git’s config file (~/.gitconfig
). You can use these to perform more complex git commands. All aliases here will start with git
. Example:
[alias]
# list most recently visited branches see http://ses4j.github.io/2020/04/01/git-alias-recent-branches/
recent = !git reflog show --pretty=format:'%gs ~ %gd' --date=relative | grep 'checkout:' | grep -oE '[^ ]+ ~ .*' | awk -F~ '!seen[$1]++' | head -n 10 | awk -F' ~ HEAD@{' '{printf(\" \\033[33m%s: \\033[37m %s\\033[0m\\n\", substr($2, 1, length($2)-1), $1)}'
Usage: git recent
At the end of the day, most of the git commands you use are just aliases around a set of more fundamental commands
Git Flow#
“git flow” is the term used to describe accepted best practices for git usage on collaborative projects.
Everyone using git typically follows some or all of git flow
The highlights:
Primary work branch, typically
develop
, where all new features get mergedcommitting directly to the
develop
branch is not allowedAdd features or fix bugs in separate feature / bugfix branches. Then merge them into
develop
For releases, you should branch off
develop
and tag the commit used for a specific releaseRelease branches should only receive bugfixes, if anything
release
branch is merged back intodevelop
Reality: sometimes a release will have hacky/temporary fixes to get a product out the door. So the merge back into
develop
will contain a portion of therelease/
branch changes, not all
Other recommendations#
Avoid copy-pasting third-party code into a project
Its a pain to track and update code copied in. it adds a bunch of unnecessary stuff to your history and can be difficult to update
Alternatives:
use a package manager built for the source code language
Fork + Submodule strategy (discussed later)
guarantees the code is always internally accessible
easy to track internal changes to the code
easy to pull in upstream fixes/updates
easy to push any approved internal changes upstream
Delete branches once they are merged
Unless there’s a strong reason not to (like if its a
release/
branch)
Most git hosts (github, bitbucket) allow you to setup rules to enforce git-flow. Use them.
restrict commits on
develop
/release/
branches to merge commits onlyRequire approval to merge
Require build/tests to pass to merge
Terminology Reference#
When searching online, its helpful to use the proper terms to describe your problem
term |
explanation |
---|---|
Repository / repo |
a project tracked by git. a folder that contains a |
Index |
aka “Staging”. Area for files/diffs before they are committed |
Working directory |
Your local copy of the repository where you make changes |
Commits |
Snapshot of repository state |
History |
The commits, branches, merges etc in a repository make that repos history |
SHA |
Secure Hashing Algorithm. A unique 40 character code used to identify all git objects |
Other Useful Commands#
Description |
Command(s) |
---|---|
List commits between refs |
|
Stash a specific file |
|
Create a patch |
|
of current diff |
|
of commits (range) |
|
Apply patches |
|
Pull rebased branch from remote (also reset an amended commit) Will DELETE local working tree changes! |
|
Put last commit back into staging |
|
Update Remote URL |
|
Delete a local branch |
|
Graph of latest commits |
|