11 The basics of Git and GitHub
11.1 Learning objectives
The learning objectives of this session are to:
- List and describe Git’s core functionality and purpose, and how GitHub expands on that.
- Explain the difference between Git and GitHub.
- Explain how GitHub differs from services like OneDrive or Dropbox, and what GitHub’s advantage is over them.
- Navigate to commonly used sections on GitHub, like the list of issues, notifications, and repositories.
- Create new repositories on GitHub.
11.2 💬 Discussion activity: Recall what you read during the pre-workshop tasks
Time: ~5 Minutes
Before we start the more practical part of the workshop, we’ll take some time to refresh your memory on what you read about Git and GitHub in the pre-workshop tasks. So:
- For 1 minute, recall what you understood about Git and GitHub from the pre-workshop tasks. Think about how you’d explain it to someone else.
- For 4 minutes, pair up with your neighbour and take turns explaining to them what you remember, 2 minutes each.
11.3 📖 Reading task: What is version control and Git?
After they’ve read it, take some time to repeat some key points from the text, such as:
- Emphasising how people usually version files.
- Highlighting that Git can track any file type, but that Git has more features for text-based files.
- Reinforcing what “plain text” files are.
Time: ~5 minutes.
The text below is the same text you read for the pre-workshop tasks.
So, why are we asking you to discuss it and then read it again?
Because Git is hard to learn. It requires changing how you think about working with files, which often takes time to adjust to. By revisiting the material through reading, discussion, and rereading we want to help you build familiarity with these concepts before moving on to the hands-on parts of the workshop.
In our work lives, we regularly work with files, such as creating or editing them. These files can be anything from text documents, to images, to code. When we work with files, we often make changes to them, and sometimes many changes. We might want to keep track of how our files change over time or “save” specific versions of the files. This tracking of file changes over time is known as version control.
Version control can very useful for many reasons. For example, maybe you want to keep track of changes to a file so you can revert back to a previous version if you make a mistake. This is especially useful when you are collaborating with others on a project, as everyone in the group might want to keep track of changes made or feedback given by different people.
But version control is also useful when you are working mostly alone on a project, since we humans tend to forget things. For instance, you might wonder why you made a certain change or what the file looked like at a certain point in time by going back to that version.
If a file has the ability to internally “track changes”, like Word does, you may have used that before, maybe when getting feedback from others. At the file level (not when opening it), you may have “tracked changes” informally by saving multiple versions of a file with different names, like in the example image below.
Does this way of saving files and keeping track of versions look familiar? The above image may exaggerate what some people’s versioning looks like, but there is some truth to it: It is the most common approach to “version control”.
This “informal” form of version control isn’t ideal because it involves multiple copies of the same file. It makes it difficult to keep track of specific changes and find the right version of the files. Having multiple versions of the same file as different names, as in the image, really highlights the need for version control and that it is hard to manually track file changes.
Luckily for us, there exist “formal” version control systems that automatically track changes to files. One of the world’s most popular version control systems is called Git. Git is used by millions of people around the world, including thousands of organisations. It is also used increasingly by researchers.
With Git you can create snapshots of file changes, known as commits. Each commit captures:
- What specific changes were made to the file or files.
- Who made the changes to the files.
- When they made the changes to the files.
Each commit also has a short message attached to it that can describe why the changes were made.
Git stores these commits in a history log. The history log allows you to quickly go back and explore the changes made to files, along with a message describing the changes. This is extremely useful when you revisit your own work after a long time and when you work in groups or with collaborators.
Git only tracks changes to files within a specific folder (and its sub-folders). In Git terminology, this folder is called a repository (or a repo for short). The best way to use a repository is to store all files related to a specific project, like a research project or administration files for your lab or group, in the repository (the “folder”). This way, you can track all changes made to all files in the project. It keeps things more organised and self-contained, since everything related to a project is in one place.
Any type of file can be stored in a repository, including both text and other non-text based files like Word or images. However, Git can only show specific changes made to a file if it is text-based, like a .txt
, .csv
, or code. Since these text-based files are literally only text characters, it is easier for the computer to show the exact changes to the exact lines of text. Unlike files like images, or Word documents (that actually aren’t just text), where there are no “lines” to track changes on.
To understand how powerful formal version control like Git is, consider these questions:
- How many files of different versions of a scientific document or thesis do you have lying around after getting feedback from your supervisor or co-authors?
- Have you ever wanted to test an analysis in a file but ended up creating a new one to avoid modifying the original?
- Have you ever deleted something and wished you hadn’t?
All these problems can be fixed by using formal version control! There are many good reasons to use version control, especially in science:
- More organised files and folders, since you only need one version of each file.
- Easier collaboration, because you can work on a single file/folder in a single central location.
- Transparency of work done for others to see, which can protect against accusations of fraud or misconduct.
- Claim to first discovery, since you have a time-stamped history of your work.
- Easier to share your work with others, since you can share the repository with them.
11.4 What is GitHub?
Verbally explain the differences between Git and GitHub, briefly go over the diagram but reinforce that we won’t cover that in this workshop. Then, highlight some simple differences between tools like OneDrive and GitHub.
There are several ways to use Git. In this workshop, we will use GitHub, which is a website that hosts Git repositories and builds on Git’s core features. This means that your Git repositories can be stored on GitHub, and you can manage your files and projects using Git through GitHub’s web interface.
Everything we do in this workshop (including storing and managing files and folders) will happen through the GitHub website. Behind the scenes, GitHub will use Git to track the changes we make.
In the simplest terms, Git is a software, while GitHub is a company and website that makes it easier to use Git and share Git repositories. For beginners, GitHub’s web interface has some advantages: you commit changes immediately after editing a file, and it’s easier to view changes and file history compared to using Git alone on your computer.
While we will only be interacting with Git on GitHub during this workshop, when you feel more comfortable with the concepts, you can eventually start using Git on your computer. Using Git on your computer has the benefit of being faster (you do work locally, so don’t need to wait for the internet) and more flexible (you can do more things with Git on your computer than on GitHub). Then you can use GitHub as a place to keep backups of your repository, to track tasks, and to make use of the other features GitHub has. How you would use Git locally with GitHub would look something like the figure below.
Using GitHub on its own is a great way to get started with Git. It allows you to learn the concepts of version control and Git without needing to install anything on your computer and without needing to learn some of the more technical details of Git. Since GitHub is a website it also makes it easier to share your work with others and to collaborate with others. This is one of the main reasons why GitHub is so popular.
You may notice that GitHub sounds a bit like file synching tools such as OneDrive or Dropbox. So how is GitHub different? Unlike OneDrive or Dropbox, GitHub (via Git) tracks line-level changes to files, not just file-level changes. This means you can see the specific changes made in a file, not just that it was changed. The messages you attach to commits can also help you keep track of why the changes were made.
OneDrive and Dropbox also use a simple way of handling conflicts when synching between the cloud and your computer by either creating a new file with some details appending to it or by overwriting which ever is newer. Git and GitHub, on the other hand, use a more complex way of handling conflicts by showing you the changes and allowing you to resolve them as you want to.
File synching tools are really good for easily sharing files within a team or group, but they aren’t as good for collaboratively working together on files. That’s where GitHub shines. It’s built for working on files together, not just sharing them.
11.5 GitHub’s landing page
During this section, visually show and walk through the different parts of the landing page. You don’t need to go into the pages just yet though.
Now that we’ve covered some of the basics of what Git and GitHub are, let’s do some practical activities on GitHub. First, we’ll start by looking at GitHub’s landing page. This is the page you see when you first log in to GitHub. It shows you a summary of what’s happening in your GitHub account as well as quick links to common items. For this workshop, we will only focus on these items:
- Top repositories list and search bar at the top of the left sidebar. Since you are new to GitHub, you likely won’t have any (or only a few) repositories listed here. When you start to work with more repositories, this is a quick way to access them.
- Create new… button on the navigation bar on the top right that looks like: . This is where you can create a new repository, as well as other items we won’t cover in this workshop.
- Issues button on the navigation bar on the top right that looks like:
. This is where you can view all issues you’ve made, participate in, or have been assigned to. From our experience, we don’t use this item often, but it can be useful to quickly find or search for issues related to your work.
- Notifications inbox button on navigation bar on the top right that looks like:
. This is where you can see all notifications about activities in your repositories, like when someone comments on an issue you made or mentions you in an issue.
We’ll return to these items later in the workshop, but for now, it’s enough for you to know where they are. Let’s go to the next section and create a new repository 🚀
11.6 Creating a new repository
Take it slow here as you create the repository since this is the first “type-along” of the workshop. Make sure to explain each option and what it does, using the table below as a guide.
Also briefly introduce Markdown and mention the .md
file extension.
Now that we’ve gone over the landing page of GitHub and taken a look at some commonly used areas of GitHub, let’s create a new repository on GitHub. This is where we will store all the files and folders for the project we will be working on in this workshop.
For this workshop, we’ll create a repository for a recipe project that will contain files with recipes 😋 🍰 🍕 🍲
- Click on the Create new… button on the top right side of the navigation bar.
- Click on the New repository option in the dropdown menu.
- A new page will open up that now shows a list of options to check and text boxes to fill in. Table 11.1 describes what each option is and what to set it as for this workshop.
Option | Description | Workshop Setting |
---|---|---|
Repository owner | This is the account you are creating the repository under. If you are part of an organisation, you can also create new repositories there. | Leave it as your own personal account. |
Repository name | This is the name of the repository or project. | Name it recipes . |
Description | This is a short description of what the repository is for and what it will contain. | Write “This is a practice repository for an introductory GitHub workshop. It includes a few recipes.”. |
Public or Private | This is whether the repository is visible to everyone (public) or only to you and people you give access to (private). | Leave it as public. |
Add a README file | This is a file that is shown on the front page of the repository. It is a good place to put information about the repository. When checked, it will say something about “set main as default branch”. You can safely ignore this message. | Make sure it is checked. |
Add .gitignore | This is a file that tells Git to ignore certain files or folders. | Leave it unchecked. |
Choose a license | This is a file that tells others how they can use the files in the repository. | Leave it unchecked. |
It’s now time to click the Create repository button at the bottom of the page. After you click the button, you’ll be taken to the front page of the repository. This is where you can see all the files and folders that are currently in this repository.
In this new repository there is only one file: README.md
. The README.md
file is a common file included in repositories to describe what the repository contains and is used for.
The files that have the file extension .md
(like this README.md
file) are Markdown files. Markdown is a plain text file format (a type of file, like Word’s .docx
) that’s designed to be easy to write and read. Markdown files are commonly used to write text documents. We won’t cover how to write Markdown in this workshop, we’ll only use them.
We will start to add and modify files in the repository in the next session. For now, this is the perfect time to talk about something that is fundamental to working with files and folders: file paths.
11.7 📖 Reading task: What is a file path?
Reinforce that:
- Paths are pointers to files on your computer
- They are for us humans to effectively organise and work with files
- Every file has a parent folder, and every folder may also have a parent folder
- Files and folders are separated by
/
or\
and that the last item in the path is either a file or a folder.
Time: ~3 minutes.
Operating systems like Windows and MacOS try really hard to make the filesystem, and ultimately file paths, hidden or obscured from the user. This has some benefits, but also some downsides. Computers and their programs depend on file paths, so by hiding them from the user, they don’t learn what they are and how to use them effectively. So as soon as a user needs to do even a bit deeper computer work, they encounter file paths and need to know how they work. This is especially true for Git and GitHub.
So to make sure we’re all on the same page, we’ll briefly describe what file paths are, and why they’re important to know about.
In simple terms, a path is the location of a file or folder in a filesystem. The end of a path is either a folder or a file and indicated by either a /
or an extension like .txt
or .docx
. All items in the path before the last item are folders. For example:
/Users/username/Documents/
is a path to theDocuments
folder, within theusername
folder, which is then within theUsers
folder./Users/username/Documents/notes.txt
is a path to thenotes.txt
file, within theDocuments
folder, which is within theusername
folder, and that finally is in theUsers
folder.
When you make files for work, it’s best to organise files and folders based on the project you are working on, so that things are easy to find and kept together. This is especially important when using tools like Git and GitHub. That’s because tools like Git and GitHub work within a specific folder and treat that specific folder as a Git repository. Then, all files within that repository (folder) are relative to one another. This “relativeness” is also shown by two “special characters”:
..
: Two dots mean the folder up one, also called the “parent folder”. In the file path/Users/username/
, the../
is the/Users/
folder, since it is one folder up fromusername/
..
: One dot means the current folder. If you’re in the folder/Users/username/
and see./Documents/
, it means theDocuments/
folder within theusername/
folder, like so:/Users/username/Documents/
.
We’ll be working with and navigating the file path on GitHub throughout this workshop, so you will get more exposure to it as we go along.
11.8 A visual walkthrough of working with files on GitHub
Show this diagram and explain it to the learners, as well as the state of the files in the repository at each step.
Before we get into hands-on work with files on GitHub in the next session, we’ll briefly do a walkthrough of what happens when you add or modify files stored in a Git repository. In Figure 11.4, we have a hypothetical series of commits (i.e., actions or changes) done in a Git repository. Each commit is shown as a “circle” or “dot” on a line that forms the Git history. The messages beside each commit are the commit message describing what the commit is about.
As we make changes to files or folders in the repository another commit is created.
The name “main” in the diagram is the default name of the “branch” that contains the saved commits in the Git history log. We won’t cover branches in this workshop.
When we look at the files that exist in the repository with each new commit, they look like:
Create repo with README: ‘README.md’
recipes/ └── README.md
Add a new recipe file: ‘toup.md’
recipes/ ├── README.md └── toup.md <- new file
Fix a typo: ‘soup.md’
recipes/ ├── README.md └── soup.md <- typo fix
Move file to new folder: ‘soups/tomato.md’
recipes/ ├── README.md └── soup/ <- new folder └── tomato.md <- moved file
Add a new recipe file: ‘baked-goods/cookies.md’
recipes/ ├── README.md ├── soup/ │ └── tomato.md └── baked-goods/ <- new folder └── cookies.md <- new file
Because the Git history consists of these connected commits that each contain changes, we’re able to see exactly what was changed. You can even go back to the state of the folder or file and see the exact contents that existed at that point in time.
Why is this important and useful? Because we humans have imperfect memories, so having it recorded in a history log makes it much easier to recall what you or your collaborators were doing at a specific point in time as well as when a specific change was added and why.
11.9 💬 Discussion activity: Explain the basics of Git and GitHub
Time: ~4 minutes.
Learning is about recalling and explaining something in your own words. And since Git is such a fundamentally different way of working with and thinking about files, this discussion activity aims to help solidify what we’ve covered so far about Git and GitHub. So:
- Take ~1 minute to silently explain to yourself what you understood the basics of Git and GitHub are.
- Pair up with your neighbour and for the next 3 minutes, take turns (1.5 minutes each) explaining to each other what you understand about the basics of Git and GitHub. Try to come to a shared understanding of what it is, how to work with it, and how it’s different from other ways of working with files.
11.10 Summary
- Git is a version control software that tracks changes to files in a repository. It allows you to see what changes were made, who made them, when they were made, and why they were made.
- A Git repository is a folder that contains all the files and sub-folders for a project.
- GitHub is a company and website that hosts Git repositories and adds tools to help you work with files in a repository.
- GitHub’s landing page as well as its navigation bar has several quick links to commonly used areas of GitHub (such as the “Create new” button, notifications inbox and issues)
- File paths are the location of a file or folder in a filesystem.
- Each change to a file in a repository creates a new commit in the Git history log, each with its own commit message (when working on GitHub).
- Commits are connected to each other creating the history of changes made within a repository.