Mastering Git and GitHub: Essential Skills for Collaborative Development
Do you know who this is? It’s quite an “unpleasant person” named Linus Torvalds. And I’m not being hostile or offensive here. He admitted this himself, saying: “I'm an egotistical bastard, and I name all my projects after myself. First ‘Linux,’ now ‘Git’” (which means an unpleasant person in British English). You may be surprised. So, this man created not only the most used operating system running at least half the Internet but also the tool used to make that same part of the Internet. So, why call himself such unpleasant words? We’ll leave that for your investigation and switch to the topic highlighted in the title.
There is a low chance you’ve never heard of Git before if you have at least some familiarity with programming. However, even if you haven’t, do not worry. It’s not that difficult; you do not need any preparatory reading. We’ll go step by step.
What is a version control system, is Git one of those, and are there alternatives?
Imagine typing in Google Docs and being unable to undo the last change. For most people, it would be unpleasant but still bearable. I say bearable because most of the time (here, I said it, not every time!), you’re focusing on text segments without too much jumping back and forth. Code writing is usually quite different. Unless you write a simple script, you’re working on dozens of files simultaneously: e.g., you have a function that accepts an integer and ten files where this function is called. You change the parameter type from an integer to a string. And then switch back. In some sense, your IDE or advanced code editor will help you with this, but only if it happens in a single session. Should you save your changes and return to them later, all of this becomes quite resourceful and error-prone.
To help us with that in the code world, we have version control systems, which are:
- systems: a set of things working together that we will dive deep into
- to control: there are hundreds of commands for you to execute
- versions: which are just states of your codebase at any specific time you decide to create a new version
You may also stumble upon an SCM acronym for source-code management. But you get the idea. Git is used to improve how we store code and navigate changes.
It’s also important to mention that Git is considered a distributed version control system. It means that the code, its versions, and everything related to managing the version isn’t stored in a single place. Every developer who has access to it acts as a full-fledged collaborator, not a client of a centralized server. And you don’t even need GitHub, GitLab, or any other service (again, don’t worry if you don’t know what those are).
Git is not the only solution for this, but we can confidently say it’s a monopolist. Look at this table to understand the situation:
People sometimes mention that Git is imperfect, but with its adoption, it’s difficult to imagine a solution that will appear and win over Git. To do it, it should bring some real benefits and have most of the functionality available in Git from day one. So, if there is a technology you can learn without being afraid it may soon become outdated, it’s Git.
GitHub, GitLab, Bitbucket: what are those?
In a nutshell, all of these services are source code repository hosting services. How are they different from Dropbox or Google Drive, which can technically be used for the same goal? They utilize Git, which means they store not only the code but all the “Git elements” that we’ll be discussing in this article. Moreover, they can work with it, so you don’t have to download the code to navigate its versions, see who made changes, etc. All of this is available directly in the browser.
While Git is a standard technology, the knowledge of which will benefit every developer, these platforms are unique. They provide different functions, have different financial plans, and have various selling features. For example, GitHub is most known for hosting many open-source products. Developers even use it as their portfolios. GitLab is known for its open-source nature. So, you can read the whole platform's code yourself. And Bitbucket, well, it exists 🙂
It’s also important to understand that all these platforms provide much more than hosting code. For example, they provide CI/CD functionality that perfectly integrates into a usual workflow. You can configure a pipeline that will be executed each time you update your code or at a specific condition. You can also release your code. But we won’t go into that much detail in this article.
We’ll discuss GitHub later after examining some of the main Git concepts.
Git client
The default way of interacting with Git is through a command-line interface. We will use it in this topic because it’s not that difficult, and knowing precisely how the tools work is usually beneficial. An alternative would be Git clients that add an overhead of UI. Examples are GitKraken, Sourcetree, GitHub Client (for GitHub only), plugins for your VS Code or IDEs, etc.
But again, although these programs may simplify some of the elements, the main disadvantage is that they abstract the way you communicate with Git natively, which may prove helpful should you use Git on a machine where it’s not possible or feasible to install a dedicated UI tool.
Git workflow in a nutshell
git init
/git clone
git log
git pull
git checkout
git merge / git rebase
git add
git status
git diff
git commit
git push
Although there are more than 150 commands in Git, this set will be in your day-to-day work, and you will use only them in 95% (not statistically measured) of the cases. Moreover, you’d probably not know about at least 100 other commands. The reason is that they are highly specific and usually are used by consulting with the documentation and then successfully forgotten till the next time.
Still, even a set of 10+ commands might be fearful for some people, and that’s okay. It is important to understand that just “knowing” them won’t do the trick; you must start using them immediately to know how and when they are helpful.
In the beginning, there is a chance you’ll be copying your whole project in the neighboring directory, just in case. And although eventually, Git will help you precisely with not doing this, it’s ok to start like that. It’s just a tool, and it’s normal not to trust it. When you get used to it, believe it, you’ll start treating Git as a place where, if something is there, it will not be lost.
git config
We start with a command I didn’t mention in the previous section. It’s the command you would use occasionally, such as to set up a new machine or add a specific configuration to a particular repository.
All the changes made with this command are written to one of the files:
[path]/etc/gitconfig
if you modify system-level settings~/.gitconfig
or~/.config/git/config
if you modify the settings for your user only.git/config
in the repository directory if you modify repository-specific settings
Nothing prevents you from modifying these files directly, but it’s always safer from a command line. This way, you’ll be sure the values used are intended to be used there.
Here are some of the basic commands you’d need to execute on a new machine:
git config --global user.name “Your name”
git config --global user.email “Your email”
These commands will set your identity, which will be visible when analyzing code and other user changes.
In the future, you might use git config user.email “[email protected]”
in a specific repository to have other credentials there. In this case, this configuration will only apply to the repository in which you used this command. Everywhere else, your global email will be used and displayed.
git config --global core.editor code
You can skip this step if you have a system editor set up and it’s okay for you to use it with Git. If not, use this command to set another editor to make some changes in the text while working with Git. It doesn’t limit you to working on your projects in that specific editor; it only includes the Git-related stuff: configs, messages, interactive rebases, etc.
If you ever stumbled into a UI that looks like this and you had no idea what it was or how to work with it, even if you had to restart your computer because there was no other choice… that’s okay. Most of us were there, too!
There are countless other configurations available in the documentation. Some help you avoid missing confirmations. Others allow to set the default behavior for commands. Most of the time, most default settings will be okay, but it’s OK to look into those occasionally because a single change may improve how you work with Git and is never unnecessary.
Git repository
There is no git repository command, but 99% (again, not statistically measured) will be done in a so-called Git repository. In a nutshell, a Git repository is a directory containing a .git directory inside it. Theoretically, you can create a Git repository by creating this directory and filling it with necessary data. Still, it’s never advised because there is a command called git init
that does the same with all the boilerplate happening automatically. It’s executed immediately, and right after it, you’re free to use all the commands available in Git towards this repository.
You’ll often use existing repositories created by other developers. Although theoretically, you can also download them; for example, as a .zip file, there is a dedicated command for getting existing repositories called git clone
.
By default, the git clone
will download all the data of the repo (all the branches, commits, etc.) in the current directory and create a new directory named the same as the repo's name. However, there are some arguments to change the behavior. You can decide where this repo should be downloaded by specifying the directory. With a --branch argument, you can tell Git that you’re only interested in a single branch. And -depth=1 will download only the most recent commit. All these additional arguments are helpful if you intend to clone a repository containing years of history you don’t need. However, most of the time git clone <repo>
will do just fine.
git log
If you create a new repository from scratch, a git log
command will be useless. However, if you cloned an existing one, there is a chance you’ll start with it. It displays the history of all the commits (we’ll talk later about what these are) in this repository, along with a message left by the committer, their information, and the date. git log
is quite a versatile command that includes a lot of different parameters that will help you use it to the fullest extent. You can:
- Limit the number of commits displayed to you
- Limit commits by:
- Author
- Date
- Modified file
- Content
- Range
- Remove redundant details and leave just essentials, displaying the whole history in a line
- On the contrary, display the history as a whole graph
- Change the format of the output and a lot more
The git log
screen may initially seem a bit overwhelming. Still, when you start using it regularly, you’ll get used to it and find specific parameters that are useful to you.
Essential Git terms
Now, after we look at three different commands, it’s important to discuss essential Git terms, which will be crucial to understanding the rest of the commands before proceeding.
There are three main areas in Git: the working directory, the staging directory, and the repository itself. A file belongs to one or multiple areas based on its location inside the root of your project or the .git directory. However, this nuance is unimportant because you’ll never move it manually. For this, you will lose commands that will be discussed further. Still, here are the areas:
Working directory. It is your project is top directory, all of its child files and subdirectories. This is something that Git doesn’t control but only observe
Staging area (also called index). It’s an intermediary, usually an area where objects do not stay for long before going to the repository. As a developer, you have to explicitly send your files to a staging directory using the git add
command
Repository. It is quite ambiguous because people usually call multiple things the repository, including the project's top directory. However, in the Git world, the repository represents files committed explicitly with a Git commit. Again, these actions do not modify your project files. They, however, leave signs in the .git directory marking the right files
It’s ok if you don’t understand it yet. It’s also ok if you’re particularly baffled by this redundant staging area. When people only start using Git, they cannot fully understand why it is needed. In short, it helps to be granular about managing your repository by creating multiple state points of your code based on the changes made. Moreover, it helps resolve conflicts that arise due to the nature of Git as a collaborative tool.
Commit is one of the most essential terms in Git. A commit represents the state of your code at a specific point in time, a snapshot. Commits generate the history of your repository, allowing you to trace how changes were made and return to any point to analyze or even run the code at that point.
A branch is another essential thing in the Git world. If the commit is a snapshot, the branch is a pointer to that snapshot. Creating a branch does minimal changes to the repository, barely creating a new pointer. “Being on that branch” and making new commits make them appear under that branch and not at the main trunk. Try drawing it for yourself to understand the concept better when working with it. Although some stuff is happening under the hood, the whole abstraction is intuitive, so it should be helpful.
Let’s move to the commands that we use to work with Git.
git add, status, commit
git commit
is one of the most important commands in the Git world that lets you create snapshots of your repository. However, it’s important to note that Git allows you to mark files that you want to be committed manually instead of automatically making a snapshot of the whole repository. The marking happens by moving files to a staging area, which we discussed in a previous section. The moving process was made possible using git add
command. Most of the time, you will use it this way: git add .
. Dot in this command tells Git to add all the files that were changed since the last commit to a staging area. It makes sense if your work focuses on a single feature or fixing a bug. However, sometimes, you’d be more granular: add specific files or use patterns. For example, git add *.js
will let you add only JavaScript files, ignoring all the rest. Again, this doesn’t mess up with the actual files in your project. It adds them to a staging area, which is essential for committing.
At some point, you’d like to know which files were changed since the last snapshot and which are already in the staging area. For it, we have a dedicated command called git status
. It displays the state of the working directory and the staging area, which are the first two areas in the Git world. This command uses nice and straightforward color coding to understand the state (green for staged files, red for unstaged ones). However, besides colors, there are different symbols to help you better understand the state of each and every file with changes.
Finally, after adding the necessary files to a staging area and verifying it using a git status
, it’s time to create a snapshot using a git commit
command. Running this command as it is will open a default editor for you to add a message to describe the changes. However, most of the time, people use the parameter “-m” and add a message to the command itself, preventing opening the editor: git commit -m <your message>
.
Finally, after you commit, you can check the status of your work and state directory changes using git status
. Also, you can use a git log
to see your new commit as a part of the repository's history.
git branch/git checkout
When you create a new Git repository, it has a default main branch. However, there is a high chance you won’t create commits on this branch most of the time. It’s just not a good practice, and the availability of branches prevents you from doing so, instead helping to manage the work more meaningfully.
There is a specific command called git branch
. In short, it allows you to list, create, and delete branches, but as with other Git commands, it has dozens of parameters that you wouldn’t use 99% of the time (classical, no statistics). However, as a rule of thumb, if you have some unusual use case in your practice that makes you execute random actions, try to read the docs. There is a high chance someone previously stumbled upon the same one, which was already fixed in Git.
Moreover, most of the time, you won’t even use the git branch
to create branches. Instead, you'd use the git checkout
command with a -b <branch_name>
parameter to create a new branch and immediately switch to it.
I say switch because it’s an intuitive term for describing what is happening. And there is even a dedicated command git switch
. However, for a long time, git checkout
was used to do the same operation. It’s called this way because it “puts” files (checks them out) from a specific branch or even a commit in a working area and makes them available to the user. It’s as if you had a lens through which you could move throughout the history of your changes but stop on a single thing at a time.
git push/git fetch/git pull
We’ve already discussed a git clone
command that allows us to get someone’s Git repository with all the project files and the files that actually make it Git repository a Git repository. However, Git doesn’t work like Google Drive or Microsoft OneDrive, automatically synchronizing all the changes when they are made. You need to ask for those explicitly, and it’s made with a command called git pull
. Technically, git pull
is a shortcut that does two commands: git fetch
and git merge
—the first one downloads all the updates made by others (including the Git updates). However, running just git fetch
won’t let you see those changes immediately. You need to merge them into your repository so they appear in your working directory. There are multiple merge strategies available in Git, and there is a configuration allowing you to decide which will be used by default. Having it set up, you can easily use git pull
, which will do both things. This is the command developers usually use.
Most of the time, git pull
is the first command you execute before starting to work on your own features. This way, you’ll see all the recent changes and be able to understand how your potential changes apply there.
After the changes are done, considering you work in a team, you’d need to “share” them with others. A git push
command is used for this. For it to work, a specific remote will be used that you’d need to configure if it’s a new repository. However, if you cloned this repository, all of this will be preconfigured and available to use.
After you push changes to a hosting service (GitHub, GitLab, etc.), your branches, commits, and the actual code will be available to everyone.
git merge/git rebase
In a perfect world, you’d pull changes made by others, do yours, and push them to the remote. However, it’s not difficult to imagine someone else working on the same pieces of code you modified simultaneously. Situations like this cause conflicts that git helps you to resolve.
Here’s a usual workflow which involves resolving conflicts:
- You pull changes from the remote
- You create a new branch from a “main” one
- You work on your own code, creating commits in the process
- When your work is done, you check out to the “main” branch and pull it once again
- You check out back to your branch and call
git merge <main>
In a perfect scenario, Git just automatically all the changes into your branch, but most of the time, there are conflicts
To help you resolve them, Git adds special markers in the code on the lines where conflicts cannot be resolved automatically. (It’s one of those moments where Git modifies your files directly). It shows you your version of the code and the version made by others. There are some tools with UI that let you quickly decide which side to choose: ours or theirs, but you can do it manually as well. Finally, no one prevents you from manually merging both changes in a way Git would never be able to.
After the merge is done, a new commit is added to your branch to specify the merge.
However, some teams prefer to use another approach for resolving conflicts and keeping their branches up-to-date. It’s called git rebase
, and what it does is instead of trying to move all the changes to the tip of your branch, it moves all your commits to the tip of a branch you want to merge onto. In some way, it rewrites the history of your branch but helps not to have that “merge commits” in history.
Usually, a decision between merge and rebases is made on a team level, especially when it concerns keeping the state of the whole tree.
Advanced Git
Most of the time, you will use the commands described above; however, sometimes, there will be cases when you need something else to be accomplished. I’ll mention them here for you to know they exist:
git stash
- This command allows for a “dirty directory.” It keeps all of your changes but doesn’t leave in the historygit revert
- reverts changes made by other commits and adds a new commit to the history that represents the undo changesgit blame
- allows you to see who modified each and every line of code. Most modern IDEs show it directly in codegit tag
- a way to add permanent marks in the Git history, which are more commonly used that specific type of commitsgit cherry-pick
- basically, moving commits from one branch to anothergit reflog
- in contrast togit log
, this command shows all the actions made on a repository, allowing you to restore some seemingly lost stuff
GitHub essentials
In a nutshell, mastering the basics of GitHub is much easier than mastering Git because most of the work is usually done with Git.
Let’s refresh our memories on what GitHub is and what it is not. Technically, GitHub is not some kind of server repository or even a central repository. It’s the same kind of repository that every developer who cloned it has on their computers. This is just how Git is designed. However, we often treat GitHub as a source of truth and refer to conflicts relying on the latest changes in GitHub instead of changes on any of the developer’s machines.
There is a concept of GitHub repositories, each used to represent the Git repository. However, it’s important to note that there is much more to the GitHub repository compared to the Git one, mainly in terms of features that GitHub provides specifically:
- It can be public or private
- Issues that help developers track problems that need to be fixed or ideas to implement
- Actions: a built-in CI/CD tool
- Projects: project-tracking software (like Jira, Trello, etc.) also built-in in GitHub
- And much, much more
With GitHub, you can freely navigate the code base, switch branches, browse commits, and even make changes, creating commits correspondingly. However, this is not considered good practice.
If you take a GitHub repository link and clone it to your local machine, it’ll automatically set up GitHub as its remote, which allows you to push, pull, and do the rest. However, if you create a new Git repository locally, you’d need to take some extra steps to make it all work. You’d need to take the repo URL and manually add it as a remote locally. From then on, everything will be the same as cloning from the existing repo.
However, the most important element of GitHub is a feature called “Pull Requests.” This is how developers suggest their changes be merged into a GitHub version of the project repository they are working on. Initially, the name can be misleading, but it is basically a request for a repository to pull your changes and add them to the source of truth.
To make a pull request, you’d need your branch with your changes pushed to the remote (GitHub) repository. Then, you create a pull request using GitHub’s UI or their CLI client, specifying a source (yours) and target (main, develop, or others) branches.
A lot of things happen in the context of pull requests:
- It can trigger actions that will validate your code
- Other collaborators will be able to leave comments line-per-line
- It will identify if it’s possible to automatically merge change from a source branch into a target branch
Conclusion
In the context of a project, libraries, frameworks, and sometimes, even languages change. But Git is so essential in today’s development world that you’d need to really try to find where it’s not used. Besides being versatile and powerful, it’s also very flexible: you can use it with different clients, rely on various hosting providers, and so on.
It’s ok if, in the beginning, you’d spend extra time trying to do things right with Git, probably even copying your whole repository in another directory “just in case,” but once you get used to it, you’ll understand that the reason for its adopting is not just a matter of habit: it’s a perfect tool for collaborative development.