11. GitHub, Version Control and Collaboration

Co-authored with Arnav Sood

An essential part of modern software engineering is using version control.

We use version control because

  • Not all iterations on a file are perfect, and you may want to revert changes.

  • We want to be able to see who has changed what and how.

  • We want a uniform version scheme to do this between people and machines.

  • Concurrent editing on code is necessary for collaboration.

  • Version control is an essential part of creating reproducible research.

In this lecture, we’ll discuss how to use Git and GitHub, largely with the built in VS Code support.

We assume that you have followed the VS Code instructions.

11.1. Setup

  1. Make sure you create an account on GitHub.com.

  2. Ensure that git is installed (as it likely was in the getting started)

  3. Setup your git username, and change default line-endings if on Windows

    1. Opening a terminal (on Windows you can use a powershell or the new “Git Bash” installed in the previous step)

    2. Running the following, where the first two lines are not required on linux and OS/X, and you should replace the email and name in the final lines

      git config --global core.eol lf
      git config --global core.autocrlf false
      git config --global user.email "you@example.com"
      git config --global user.name "Your Name"   
      git config --global github.user "GITHUBUSERNAME"   
      
  4. Ensure that VS Code is installed

  5. Install the GitLens extension.

  • Optional, but highly recommended.

  • It provides an enormous amount of detail on exact code changes within github repositories (e.g., seamless information on the time and individual who last modified each line of code).

11.1.1. Git vs. GitHub vs. Git Clients

To understand the relationship

  • Git is an infrastructure for versioning and merging files (it is not specific to GitHub and does not even require an online server).

  • GitHub provides an online service to coordinate working with Git repositories, and adds some additional features for managing projects.

  • GitHub is the market leader for open source projects and Julia, but there are other options, e.g. GitLab and Bitbucket.

We will use the built-in VS Code Git and GitHub support in this lecture, but you may consider using alternatives

  • GitHub Desktop is a clean and simple GUI for git, which is often useful in conjunction with VS Code.

  • GitKraken is a superb, specialized tool which makes many advanced operations intuitive.

  • Or, you may directly use the Git command line, which can be convenient for simple operations (e.g. git clone https://github.com/QuantEcon/lecture-julia.notebooks would clone the notebook repository), but tends to be harder for more advanced operations.

Since these lecture notes are intended to provide a minimal path to using the technologies, here we will conflate the workflow of these distinct products.

11.2. Basic Objects

11.2.1. Repositories

The fundamental object in GitHub is a repository (or “repo”) – this is the master directory for a project.

One example of a repo is the QuantEcon Expectations.jl package.

On the machine, a repo is a normal directory, along with a subdirectory called .git which contains the history of changes.

11.2.2. Commits

GitHub stores history as a sequence of changes to text, called commits.

Here is an example of a commit, which modifies instructions on using an extension in these lecture notes.

In particular, commits have the following features

  • An ID (formally, an “SHA-1 hash”)

  • Content (i.e., a before and after state)

  • Metadata (author, timestamp, commit message, etc.)

Note: It’s crucial to remember that what’s stored in a commit is only the actual changes you make to text.

This is a key reason why git can store long and complicated histories without consuming massive amounts of memory.

11.2.3. Common Files

In addition, each GitHub repository usually comes with a few standard text files

  • A .gitignore file, which lists files/extensions/directories that GitHub shouldn’t try to track (e.g., LaTeX compilation byproducts).

  • A README.md file, which is a Markdown file which GitHub displays by default as the homepage when accessing the repository online.

  • A LICENSE.txt file, which describes the terms under which the repository’s contents are made available.

For an example of all three, see the Expectations.jl repo.

11.3. Individual Workflow

In this section, we’ll describe how to use GitHub to version your own projects.

Much of this will carry over to the collaborative section.

11.3.1. Creating a Repository

In general, we will always want to make repos for new projects using the following dropdown

../_images/git-makerepo.png

We can then configure repository options as such

../_images/git-makerepo-full.png

In this case, we’re making a public repo github.com/USERNAME/example_repository where USERNAME is your GitHub account name. The options chosen are:

  • Add in a README.md.

  • License under the MIT open-source License.

  • Ignore Julia compilation byproducts in the .gitignore.

  • Leave off support for the Marketplace Apps Codecov, which we will discuss further in the testing lecture lecture.

Note

You can also add an existing folder as a new repository on github, where you can use the VS Code features to initialize and publish a repository to GitHub. Otherwise, the instructions are more involved.

11.3.2. Cloning a Repository

The next step is to get this to our local machine. If you click on the <> Code button on the repositories website, you will see a dropdown such as

../_images/git-clone.png

This dropdown gives us a few options

  • The copy button below the Clone with HTTPS can be used by either the commandline or other tools.

  • Open in Desktop will call to the GitHub Desktop application if you installed it

  • Download Zip will download the directory without the .git subdirectory (avoid this option, as it defeats the purpose of version control).

We will download the repository using the built-in VS Code support.

  1. Copy the https URL in that dropdown (e.g. https://github.com/USERNAME/example_repository.git).

  2. Start VS Code.

  3. Use Ctrl+Shift+P to open the command bar, and choose > Git: Clone

  4. At this point, you can paste in the copied URL or choose Clone from GitHub and then it will let you select your repositories after logging in.

    ../_images/vs-code-clone.png
  5. Select a location (e.g. c:\users\USERNAME\GitHub) which will clone the repository into a folder (e.g. into c:\users\USERNAME\GitHub\example_repository) which holds both the files themselves and the version information. This folder then has all of the information associated with this repository.

  6. After the repository is cloned, you can choose to Open in a New Window.

    ../_images/vs-code-done-clone.png

You will see the automatically generated files for the LICENSE, .gitignore and README.md.

Note

To manually clone this to your desktop, you can start a terminal and use git clone https://github.com/USERNAME/example_repository.git within the directory you want to clone it to. After cloning, you can open the folder within VS Code by either

  • Within a terminal on your operating system, navigate to that directory and type code .

  • On Windows if you installed VS Code with the appropriate option, right click on the folder and choose Open with Code - trusting the authors as required on opening the folder.

  • In the VS Code Menu, choose File/Open Folder....

11.3.3. Making, Committing, and Pushing Changes

Now that we have the repository, we can start working with it.

Within VS Code, make the following changes:

  1. Open the README.md and add some text.

  2. Add a new file called some_file.txt with some text in it. You can do this with the menus, or by right clicking in the Files panel and selecting “New File”.

  3. Add another new file called garbage_file.tmp with some text in it.

  4. Finally, in the .gitignore, add *.tmp at the end.

Your editor should look something like

../_images/vs-code-edits-1.png

Note that the panel on the left hand side is highlighted with 3 changes. Select that tab to see the current modifications relative to the current version on github.

This shows three changes:

  1. README.md is modified.

  2. .gitignore is modified.

  3. some_file.txt is a new file.

Note that garbage_file.tmp is not listed, as the *.tmp extension was ignored in the .gitignore. If you click on a file, such as the README.md in this panel, it will open up the file to show all of the changes relative to the last commit. For example,

../_images/vs-code-edits-2.png

Let us push these changes to GitHub. Add text in the “Message” in this panel, and then click on the checkmark to commit it.

As git is a decentralized version control system, this change is now only local to your machine. You can make a variety of changes locally and only push to GitHub when you are ready.

To push these to the server, you can use the > Git: Push command or you can click on the bottom bar of vscode, which should show that one commit is ready to be uploaded and none are ready to be downloaded.

../_images/vs-code-edits-3.png

11.3.4. Exploring Commits in VS Code and on the GitHub Website

If you refresh your web browser with the github repository open, you will see changes and that it now shows multiple commits have been made to this project, as well as the last person to modify it.

../_images/vs-code-edits-4.png

This functionality will help you track down changes in a project, and in particular, provide tools to track down when functionality may have stopped working or parameters were changed in a project.

On that page, either choose the description of the commit to display it, or choose the entire list (e.g. 2 commits) and select the most recent.

This shows a summary of all of the changes between this commit and the last.

../_images/vs-code-edits-5.png

This interface lets you explore all of the lines of code that were modified between different commits to track down changes.

Finally, with the GitLens extension, you can see this information within the files themselves. Open the README.md file, and move your typing cursor to one of the lines within the file.

You will then see a nearly transparent description of (1) who made the last modification to this line; (2) when it occurred; and (3) the message on the commit that modified it. If you further hover over this you can access even more information on the appropriate commit

../_images/vs-code-edits-6.png

You can see more detail on that commit in a variety of ways - such as clicking on that popup. One convenient approach is in the main Explorer pane (not the Git pane), which as an expandable Timeline. Select this for the README.md to compare the current version to another in the timeline,

../_images/vs-code-edits-7.png

11.3.5. Pulling Changes and the Online Editor

The opposite of pushing changes is to pull changes made externally and previously pushed to GitHub.

To see this workflow, we will make an external modification and bring the changes locally.

Start in the GitHub webpage for your repository and choose the README.md file.

You can edit this online by choosing the pen icon, or on some platforms by just typing a . and it will open in an online version of VS Code on that repository (which can also be accessed by a change in the url from github.com to github.dev, e.g. https://github.come/USERNAME/example_repository/blob/main/README.md to https://github.dev/USERNAME/example_repository/blob/main/README.md)

Here is what that UI looks like in that case after editing the text and choosing the Git pane on the webpage (just as in any other VS Code installation)

../_images/vs-code-edits-9.png

As with the desktop VS Code, we then need to commit this with a message. However, as it is online you will not be able to have local changes, and it commits without a manual “Push”.

Go back to the desktop VS Code, and you will see that the Git bar at the bottom now shows an incoming change.

../_images/vs-code-edits-10.png

Notice that the direction of the arrow is the opposite of when we made local modifications. Whereas moving local commits to the server is called a “Push”, bringing external changes down to your desktop is called a “Pull”.

But before we pull these changes, we will show how Git can automatically merge them (often on the same file, but at different lines of code).

On your local VS Code, open the README.txt and change the title from # example_repository to # example_repository_modified. Then save and commit this change with a commit message such as Local Modification. It is important that you modified the top line, and not the same one that you changed in the online editor.

You will notice that at the bottom it now shows one commit coming from the server, and one going back up.

Now click on that icon on the bottom of the VS Code editor, which will do a Pull and Push of these changes. Assuming that you were careful not to modify the same line of code, it will determine that these two changes do not clash, and both commits will be added locally and on the server.

../_images/vs-code-edits-11.png

11.3.6. Discarding Changes

A common scenario with Git is that you are making temporary local modifications and want to discard them prior to updating from the server.

Note

The .gitignore is very useful for ensuring that some files are always ignored. For example, temporary files in computations or .pdf files which coauthors will compile themselves (or having them automatically compiled with a GitHub Action).

To see this workflow prior to making a commit:

  1. Save a change to the README.md

  2. Open the Git pane, which will show the one modification.

  3. Right click on the modification you wish to discard (can be file-by-file).

../_images/vs-code-edits-8.png

By selecting Discard Changes you can revert back to the last commit that had been made without any local modifications.

11.3.7. Reverting Commits

On the other hand, if you have already made a commit, then there is a record of this change in the history which cannot be removed directly. You can, however, easily revert back that particular change.

For example, open some_file.txt, make a change, and commit/push the modification.

To restore an older version, with GitLens installed, go to the FILE HISTORY in the source control pane, and right click on the older version you wish to keep.

../_images/vs-code-edits-12.png

Choose Restore (Checkout), which will add a modification into the Staged Changes.

Then provide a commit message, and push to the server.

This will not remove the history of the older commit, but will instead produce the opposite changes required to restore it.

See the VS Code documentation for more features.

11.3.8. Merge Conflicts

While in the previous examples, we showed with the online editor how we could make external changes and Git would automatically merge them if possible, other times it will not be possible.

To demonstrate this, follow the same instructions to modify the top line in the README.md with the online editor, and commit the change

../_images/vs-code-edits-13.png

Then in your desktop, change the same line of code and commit it, but don’t push the change

../_images/vs-code-edits-14.png

As before, at the bottom of the window, it shows a commit going to the server, and another coming down. Click on that button to push and pull the changes.

../_images/vs-code-edits-14.png

As expected, it was unable to automatically merge the changes, and requires manual intervention to merge.

It should bring you to the editor to deal with these merge conflicts

../_images/vs-code-edits-15.png

Since the change on the server occured before your local change, you will need to address this conflict before pushing your change. While you can manually modify the files, the user interface lets you navigate and choose which changes to accept.

In that view, choose “Accept Current Change” within the editing screen, and right above the line of code that is highlighted. This will use your local change and overwrite the one of the server. Or choose “Accept Incoming Change” to use the server’s version.

../_images/vs-code-edits-16.png

An alternative workflow is to right click on the file, and choose Accept All Current or Accept All Incoming to choose one version of the file without going through individual decisions.

After modifying:

  1. Save the file if you have resolved the merge conflict.

  2. Choose the + next to the modified file in the source control pane, or right click on the file and choose Stage Changes.

  3. Add a commit message, commit the file

  4. Do a Push to synchronize with the server.

See this youtube video for more details.

Note

An important “break-glass-in-emergency” feature with git is to completely revert all local changes and reset to the latest commit on the server.

To do this, in a terminal within the repository execute git reset --hard origin/main (or master if the primary branch is called master rather than main). But remember, this will erase all local changes, so back files up locally if required.

11.4. Collaborative Work

We have already seen how Git features (e.g. automatic merging of changes, and tools to manage conflicts where it fails) provide tools for collaboration.

Here we will look at further GitHub and VS Code functionality to support collaboration.

11.4.1. Adding Collaborators

First, add a collaborator to the USERNAME/example_repository lecture we created earlier.

We can do this by choosing Settings then Manage Access, and finally Invite a Collaborator.

../_images/git-collab.png

Adding someone as a collaborator is required for access to private repositories, and can allow them direct access to changing the repository for public ones.

11.4.2. Project Management

In addition, having individuals as collaborators enables further project management features.

The main feature is an issue, which we can create from the issues tab.

You should see something like this

../_images/git-issue.png

Let’s unpack the different components

  • The assignees dropdown optionally lets you select people tasked to work on the issue.

  • The labels dropdown lets you tag the issue with labels visible from the issues page, such as “high priority” or “feature request”.

  • It’s possible to tag other issues and collaborators (including in different repos) by linking to them in the comments – this is part of what’s called GitHub-Flavored Markdown.

You can see open issues at a glance from the general issues tab

../_images/git-issue-tab.png

The checkboxes are common in GitHub to manage project tasks.

11.4.3. Reviewing Code

Whenever people push to a project you’re working on, you’ll receive an email notification.

You review individual commits by opening a commits and commenting.

../_images/git-review.png

Or by clicking on the line number in the description of the commit differences to have discussions about a specific line of code. Throughout, if you have a collaborator, you can refer to them by @username replacing username with their GitHub name.

../_images/git-review-2.png

11.5. Collaboration via Pull Request

One of the defining features of GitHub is that it is the dominant platform for open source code, which anyone can access and use.

However, while anyone can make a copy of the source code, not everyone has access to modify the particular version stored on GitHub.

A maintainer (i.e. someone with “write” access to directly modify a repository) might consider different contributions and “merge” the changes into the main repository if the changes meet their criteria.

A pull request (“PR”) allows any outsiders to suggest changes to open source repositories.

A PR requests the project maintainer to merge (“pull”) changes you’ve worked on into their repository.

There are a few different workflows for creating and handling PRs, which we’ll walk through below.

Note: If the changes are for a Julia Package, you will need to follow a different workflow – described in the testing lecture.

11.5.1. Quick Fixes

GitHub’s website provides an online editor for quick and dirty changes, such as fixing typos in documentation.

For example, you can navigate to the source for these lecture notes in https://github.com/QuantEcon/lecture-julia.myst and navigate to the README.md.

From there, either click the small pencil to the upper right of the text or type . to launch the web editor (which launches the web editor with https://github.dev/QuantEcon/lecture-julia.myst/blob/main/README.md).

For example, choosing the pencil might look like

../_images/git-quick-pr.png

Where you can give the name for the new branch and provide a description of your changes for review by maintainers.

This will create a “fork” of the entire repository, make your change as a commit, and then upload this change to the repository you have forked from (i.e., the Pull Request).

But what if we want to make more in-depth changes?

11.5.2. No-Access Case

A common problem is when we don’t have write access (i.e. we can’t directly modify) the repo in question.

In that case, click the “Fork” button that lives in the top-right of every repo’s main page.

This will copy the repo into your own GitHub account.

For example, we can fork the source for these lectures from this repo to our own account (e.g. to https://github.com/USERNAME/lecture-julia.myst).

In fact, if you created a PR for a change in the web GUI, it may already be forked to your account.

Regardless, you should have a new repository in your account with the same name but different URL, along with a special icon to indicate that it’s a fork. It will also say where it is forked from, and summarize the number of commits that are different.

../_images/git-clone-fork.png

We then can clone this repository to our desktop. Follow the instructions to clone the forked repository to your desktop (e.g. in VS Code, > Git: Clone, then choose https://github.com/USERNAME/lecture-julia.myst.git for your account, and then Open in New Window).

In order to more easily manage changes, we need to create a new branch. Branches are a separate sequence of commits that diverge from the main branch at some point (and may later be merged back in). It allows you to manage a sequence of separate changes as a coherent unit.

Click on the bottom left of the screen where it says main to create or select a new branch, and then choose New Branch and title it readme-mod.

../_images/new-branch.png

You will note that the bottom left hand corner now has chosen that branch rather than the main one. You can use that to easily switch between them in the editor.

../_images/new-branch-2.png

Open the README.md file and make a change to it in the branch, then save and commit with a message. At this point, the branch and your commit are still only local.

To push onto the server, choose the arrow next to the branch name to publish changes.

../_images/new-branch-3.png

At this point, you can choose whether you want this branch to be related to the origin (i.e., your fork) or the original repository upstream. Since this is intended to propose a change to the public repository, you should choose upstream.

After selecting, it will ask you whether you want to create a Pull Request. Choose Yes, and then fill in the information for the new pull request and Create it.

../_images/new-branch-4.png

You can Exit Review Mode after it has been created at which point it may switch to your main branch.

Use the branch selection on the lower left hand of the editor to change branches as you wish.

Finally, for the original repository you forked from, they will now see this as a proposed change, as below

../_images/new-branch-5.png

The maintainers of the repository can choose to accept the change, suggest modifications, comment on individual lines of code, etc.

If they like the changes, they may choose to “Merge the Pull Request”. This means that all of the commits proposed from your branch will be applied on top of the main branch on that repository.

If there are conflicts (i.e., you are changing files which have been modified on the main branch since you forked) then you may need to rebase your branch and pull request. This means applying all of the changes that have happened after your fork to your current branch.

While rebasing can be very complicated, for simple cases you can use the > Git: Rebase (Branch) command and choose upstream/main to get any of the recent changes. For more advanced resolution of conflicts and rebasing, GitKraken and GitLens have more elaborate features.

Creating a pull request is not like bundling up your changes and delivering them, but rather like opening an ongoing connection between two repositories, that is only severed when the PR is closed or merged.

11.5.3. Write Access Case

As you become more familiar with GitHub, and work on larger projects, you will find yourself making PRs even when it isn’t strictly required.

If you are a maintainer of the repo (e.g. you created it or are a collaborator) then you don’t need to create a fork, but will rather work with a git branch.

Branches in git represent parallel development streams (i.e., sequences of commits) that the PR is trying to merge.

For example, back on our example_repository, select the branch name on the bottom left corner of VS Code, and make a new branch (e.g. readme-mod)

../_images/new-branch-6.png

Modify the README.md file, then commit the change to your branch. As before, this only modifies your local machine, so you will need to publish the branch by clicking on the arrow next to the branch name.

While you could directly add a PR at this point, choose not to create one so we can create it on your webpage.

If we refresh the github website, it prompts us to create a PR.

../_images/new-branch-7.png

We can choose Compare and Pull Request at this point, giving the PR a name a description, or we can create one later with the branches tab on the repository. If we create the PR, we will see this in our Pull Requests tab at the top of the webpage

../_images/new-branch-8.png

Typically these would be left open while a cohesive feature or bug is fixed in your code, and then merged.

The Reviewers can look at the PR, select it on their desktops (by changing to the appropriate branch), and comment on the code.

../_images/new-branch-9.png

Throughout, the main branch may be modified through other branches and commits, so this could become out of sync. If so, then rebasing and conflict resolution is required, as discussed above.

However, if no conflicts occur, and the commits from the branch can be merged into the main branch, then you can choose to Merge Pull Request.

../_images/new-branch-10.png

The key difference in the options is that the Squash and Merge will combine all of the individual commits in this branch into a single commit before merging into the main branch.

The UI then prompts you to remove the branch. After the PR has been merged, you will typically want to delete that branch to avoid accidentally modifying it.

Finally, we see that this commit is now listed on the main branch

../_images/new-branch-11.png

11.5.4. Julia Package Case

One special case is when the repo in question is actually a Julia project or package.

We cover that (along with package workflow in general) in the testing lecture.

11.6. Additional Resources and Troubleshooting

You may want to go beyond the scope of this tutorial when working with GitHub.

Here are some resources to help

11.6.1. Command-Line Basics

Git also comes with a set of command-line tools.

They’re optional, but many people like using them.

Furthermore, in some environments (e.g. JupyterHub installations) you may only have access to the command line.

  • On Windows, downloading git will have installed a program called git bash, which installs these tools along with a general Linux-style shell.

  • On Linux/MacOS, these tools are integrated into your usual terminal.

To open the terminal in a directory, either right click and hit “open git bash” (in Windows), or use Linux commands like cd and ls to navigate.

See here for a short introduction to the command line.

As above, you can clone by grabbing the repo URL (say, GitHub’s site-policy repo) and running git clone https://github.com/github/site-policy.git.

From here, you can get the latest files on the server by cd-ing into the directory and running git pull.

When you pull from the server, it will never overwrite your modified files, so it is impossible to lose local changes.

Instead, to do a hard reset of all files and overwrite any of your local changes, you can run git reset --hard origin/master.

11.7. Exercises

11.7.1. Exercise 1a

Follow the instructions to create a new repository for one of your GitHub accounts. In this repository

  • Take the code from one of your previous assignments, such as Newton’s method in Introductory Examples (either as a .jl file or a Jupyter notebook).

  • Put in a README.md with some text.

  • Put in a .gitignore file, ignoring the Jupyter files .ipynb_checkpoints and the project files, .projects.

11.7.2. Exercise 1b

Pair-up with another student who has done Exercise 1a and find out their GitHub ID, and each do the following

  • Add the GitHub ID as a collaborators on your repository.

  • Clone the repositories to your local desktop.

  • Assign each other an issue.

  • Submit a commit from VS Code which references the issue by number.

  • Comment on the commits.

  • Ensure you can run their code without any modifications.

11.7.3. Exercise 1c

Pair-wise with the results of Exercise 1b examine a merge-conflict by editing the README.md file for your repository that you have both setup as collaborators.

Start by ensuring there are multiple lines in the file so that some changes may have conflicts, and some may not.

  • Clone the repository to your local desktops.

  • Modify different lines of code in the file and both commit and push to the server (prior to pulling from each other)–and see how it merges things “automatically”.

  • Modify the same line of code in the file, and deal with the merge conflict.

11.7.4. Exercise 2a

Just using GitHub’s web interface, submit a Pull Request for a simple change of documentation to a public repository.

The easiest may be to submit a PR for a typo in the source repository for these notes, i.e. https://github.com/QuantEcon/lecture-julia.myst.

Note: The source for that repository is in .md files, but you should be able to find spelling mistakes/etc. without much effort.

11.7.5. Exercise 2b

Following the instructions for forking and cloning a public repository to your local desktop, submit a Pull Request to a public repository.

Again, you could submit it for a typo in the source repository for these notes, i.e. https://github.com/QuantEcon/lecture-julia.myst, but you are also encouraged to instead look for a small change that could help the documentation in another repository.

If you are ambitious, then go to the Exercise Solutions for one of the Exercises in these lecture notes and submit a PR for your own modified version (if you think it is an improvement!).