What is a Github CFOR

A CFOR (pron. C4) is a Cross-Fork Object Reference. They are a feature of how Github handles forks and pull requests, but they can result in information leakage. The term was coined in a post by Truffle Security on Github data earlier in 2024. In essence, secrets committed accidentally to a fork of a repository remain visible in the original, even if the fork is deleted.

CFOR Replica Building

This walkthrough shows what is going on server-side, by creating a replica of Github on a Docker container, to study CFOR’s in more detail.

Setting Up the Environment

Dockerfile

First, let’s create a container to use as our replica building environment. In a new folder, create this Dockerfile:

FROM ubuntu:24.04

# Set environment variables to reduce interaction during installation
ENV DEBIAN_FRONTEND=noninteractive

# Update the package manager and install necessary tools
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y git sudo vim;

# Create user
RUN useradd -ms /bin/bash github && \
    echo 'github ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# Switch to the non-root user
USER github
WORKDIR /home/github

ENTRYPOINT ["/bin/bash"]

Now get a shell in the playground:

# Build our Docker lab
docker build --tag github-playground .
docker run --interactive --tty --name github-playground-container github-playground

Creating a Remote and a Local Clone

Setting Up Our Main Remote Repo

In our new container, let’s set up a main repo on ‘github’. This will represent an actual Github project in our replica.

# Make a replica Github with a repository in it
mkdir --parents ~/github/main-repo
cd github/main-repo/
git init --bare --initial-branch=master

The remote repo is ‘bare’, meaning there is no actual copy of the code on the filesystem here.

Setting the Replica Github’s Garbage Collection

Now, we need to make the garbage collection realistic. ‘Garbage’, in this context, means commits which are ’loose objects’. We’ll study this in more detail later in the tutorial, but essentially some commands, just as using git branch -d to delete a branch pointer, or git rebase, result in some older commits not being used anymore. That is to say that they are no longer an ancestor of any commit which is a branch tip.

If that isn’t yet clear enough, we will be creating and studying such ’loose object’ commits later in the tutorial.

For now, let’s simply note that the default git settings on a laptop allow commits to remain on disk even when they’re not being used. Github, however, are more aggressive with their garbage collection. We will emulate that in our replica now. In ~/github/main-repo:

# Set aggressive garbage collection policy on replica Github
git config gc.pruneExpire "now"

Cloning our Replica Github Repo Locally

Now let’s set up a clone of the repo on a developer’s machine, within the replica:

# Create a local clone of the upstream repo
mkdir --parents ~/local
cd ~/local
git clone ../github/main-repo/

You should see:

Cloning into 'main-repo'...
warning: You appear to have cloned an empty repository.
done.

What did we just do? We have cloned a ‘github’ repository locally. Let’s have a look at our remote situation, in ~/local/main-repo:

~/local/main-repo$ git remote --verbose
origin	/home/github/local/../github/main-repo/ (fetch)
origin	/home/github/local/../github/main-repo/ (push)

You may be more used to seeing a remote URL look like [email protected]:vendor/repo.git, but this use of a local repo as a remote works just the same.

Creating a Website Component and Committing/Pushing

In the replica of a local machine at ~/local/main-repo, do the following to our clone:

# Be a project maintainer
git config user.name 'Project Maintainer'
git config user.email '[email protected]'

# Build a website
echo 'hello world' > index.html
git add --all
git commit --message 'Created website component'
git push origin master

Creating and Merging Pull Requests

Creating Feature Branches

Let’s create some feature branches locally, to prepare to make our pull requests. Locally in ~/local/main-repo:

# Create the first feature branch and push it
git switch --create feature
echo 'feature' > feature.html
git add --all
git commit --message 'Added a new feature'
git push origin feature
git switch master

# Create the second feature branch, from master, and push it
git switch --create feature2
echo 'feature 2' > feature2.html
git add --all
git commit --message 'Added another new feature'
git push origin feature2
git switch master

The commit graph is starting to emerge, so let’s keep a close eye on it:

Commit graph with two feature branches

Here we see:

  • the commit with the master branch pointer at the bottom
  • HEAD is attached to master, which means we have this branch checked out
  • branch pointers feature and feature2, each on a commit which has the HEAD commit as its parent
  • tracking branches called things like origin/master, showing where the branch master was on the remote origin the last time there was any communication with this remote

Creating Pull Requests

Now that our feature branches have been pushed to the remote, let’s open some pull requests. On the real Github, we might see something like this:

Button to create a PR

We need to add some functionality to our Github replica here. What happens when you push the button? We can add the necessary functionality like this:

# Create an empty script to create pull requests
mkdir --parents ~/github/github-functionality/
touch ~/github/github-functionality/create-pull-request.sh
chmod 755 ~/github/github-functionality/create-pull-request.sh

And in our shell script (~/github/github-functionality/create-pull-request.sh), let’s add this PR-creating content:

#!/bin/bash

cd ~/github/main-repo/

# Create the special 'pull' ref for the numbered pull request
git update-ref refs/pull/$1/head refs/heads/$2

# Make a temporary worktree to simulate merging, so a diff can be shown in the browser
mkdir --parents ~/github/tmp/
git worktree add ~/github/tmp/merge-worktree master
cd ~/github/tmp/merge-worktree

# Sign is as whoever
git config user.email "[email protected]"
git config user.name "Github Backend"

# Make simulated merge commit
git switch --detach
git merge --no-commit $2
git commit --message 'Simulated merge commit for PR #2'

# Create the other pull ref for how things would look if the PR were merged
git update-ref refs/pull/$1/merge HEAD

# Clean up the temp worktree
cd ~/github/main-repo/
git worktree remove ~/github/tmp/merge-worktree/

Now that we have built out some of the functionality of the Github website, let’s pretend someone created the merge requests:

cd ~/github/main-repo/
../github-functionality/create-pull-request.sh 1 feature
../github-functionality/create-pull-request.sh 2 feature2

A Somewhat Loose Object Reference

Inspecting the Object Reference

Let’s turn to what happens if we look at the PR’s head ref we created in more detail:

github@10df88f5e5f0:~/github/main-repo$ git show pull/2/head
commit c154b17addfeff5a94f9fc497e2fc4f3ab577155 (feature2)
Author: Project Maintainer <[email protected]>
Date:   Thu Nov 7 18:16:28 2024 +0000

    Added another new feature

diff --git a/feature2.html b/feature2.html
new file mode 100644
index 0000000..7268a88
--- /dev/null
+++ b/feature2.html
@@ -0,0 +1 @@
+feature 2

This object reference isn’t exactly loose - there is a ref pointing to it in our remote repo, but what will happen to if if the branch goes away? We can find out by deleting it. From ~/local/main-repo:

git switch master
git branch -D feature2
git push --set-upstream origin :feature2

Now, we have deleted feature2 on the clone of our remote, and pushed that branch deletion upstream. Going back to our remote to see what happens if we try to view our pull request again, from ~/github/main-repo:

github@10df88f5e5f0:~/github/main-repo$ git show pull/2/head
commit c154b17addfeff5a94f9fc497e2fc4f3ab577155 (feature2)
Author: Project Maintainer <[email protected]>
Date:   Thu Nov 7 18:16:28 2024 +0000

    Added another new feature

diff --git a/feature2.html b/feature2.html
new file mode 100644
index 0000000..7268a88
--- /dev/null
+++ b/feature2.html
@@ -0,0 +1 @@
+feature 2

Nothing has changed. We can still see the changeset of a PR in our replica Github, even if we closed it and deleted the branch on the clone.

Implications for CFOR’s

The commit we have viewed isn’t part of any branch that exists on the clone, or on the remote, but the commit is recoverable by another kind of reference, which we have called ref/pull/2/head.

This shows us that there is a record here of the changeset that was PR’ed. Looking at the PR will display the code. Even if the PR gets rejected and the branch gets deleted, we should still be able to see this.

Looking at this from the perspective of building a replica of Github, it is hard to see how things could be any different. Would we rather not be able to see the rejected PR’s changeset? It seems that everything is working as it should.

However, there is undoubtedly the potential for information to live longer than one might have thought. There isn’t an obvious way of getting rid of that commit now. The implications are magnified when we deal with forks.

Creating a Fork

Setting Up Fork on the Remote

Let’s fork the repo.

# Create a fork of the remote on our Github replica
git clone --bare ~/github/main-repo/ ~/github/forked-repo
cd ~/github/forked-repo
git remote add upstream ~/github/main-repo/

Cloning the Fork Locally

Cloning the fork in our replica of a local environment:

# Clone the fork
cd ~/local/
git clone ~/github/forked-repo/ ~/local/forked-repo

Committing Secrets to the Fork

Creating Secrets

Let’s commit some secrets to the forked repository ‘by accident’. We will later open a pull request, before deleting the fork in an effort to make the data leak go away. In ~/local/forked-repo/:

# Be a contributor
git config user.email '[email protected]'
git config user.name 'Project Contributor'

# Commit secrets by accident
echo 'AWS_CREDENTIALS=abc123!!!' > config.ini
git add config.ini
git commit --message 'Add secret to version control by accident'

# Delete secrets in a separate commit
git rm config.ini 
git commit --message 'Delete thing added by accident'

# Create more benign content
echo 'feature 3' > feature3.html
git add feature3.html 
git commit --message 'Add feature 3'

# Push branch with secrets in history
git push origin feature3

We have now pushed secrets in the history of the feature3 branch to the remote fork.

Opening a Pull Request, Then Deleting the Fork

Further Replica-Building

In terms of replica-building, we need to build out a bit more functionality for our Github replica now. Similar to our create-pull-request.sh command, we will create a shell script to create cross-fork pull requests.

# Create an empty script to create pull requests
mkdir --parents ~/github/github-functionality/
touch ~/github/github-functionality/create-cross-fork-pull-request.sh
chmod 755 ~/github/github-functionality/create-cross-fork-pull-request.sh

Let’s populate ~/github/github-functionality/create-cross-fork-pull-request.sh with the functionality our Github replica needs:

#!/bin/bash

cd ~/github/forked-repo/

# Push the 'pull' ref to the upstream repo
git push upstream $2:refs/pull/$1/head

# Make a temporary worktree to simulate merging, so a diff can be shown in the browser
cd ~/github/main-repo/
mkdir --parents ~/github/tmp/
git worktree add ~/github/tmp/merge-worktree master
cd ~/github/tmp/merge-worktree

# Sign is as whoever
git config user.email "[email protected]"
git config user.name "Github Backend"

# Make simulated merge commit
git switch --detach
git merge --no-commit refs/pull/$2/head
git commit --message 'Simulated merge commit for PR #2'

# Create the other pull ref for how things would look if the PR were merged
git update-ref refs/pull/$1/merge HEAD

# Clean up the temp worktree
cd ~/github/main-repo/
git worktree remove ~/github/tmp/merge-worktree/

Creating Our Cross-Fork Pull Request

We can now create our pull request like this:

~/github/github-functionality/create-cross-fork-pull-request.sh 3 feature3

Deleting the Fork

To make the point about Cross-Fork Object Reference information leaks, and the irrelevance of deleting the fork, let’s trash it now.

rm -rf ~/github/forked-repo/
rm -rf ~/local/forked-repo/

Now the fork is gone, the leaked secrets are gone, right?

Unfortunately not.

Exposing the Cross-Fork Object Reference

Creating the Github API

Let’s build out one last piece of functionality for our Github replica-building.

# Create an empty script to create pull requests
mkdir --parents ~/github/github-functionality/
touch ~/github/github-functionality/api.sh
chmod 755 ~/github/github-functionality/api.sh

And to populate this script:

#!/bin/bash

cd ~/github/main-repo/
git show $1 2>/dev/null || echo '404...'

Finding Out the Target Commit Hash for Reference

To help us in our demo, let’s find out the target commit hash.

Going to ~/github/main-repo and checking git log --graph --all:

Commit graph on the remote

Looking for the message Add secret to version control by accident, for me, the target commit hash is 85c33a767c7a943b669234e683849211b5935206. You will get a different hash.

Brute-Forcing the Commit Hash

We can then keep calling the API to try and find the commit. Having cheated and looked up the correct hash, we can try a few short hashes to see how easy it would be to find.

In ~/local/, we keep trying ~/github/github-functionality/api.sh [short hash]:

Brute forcing the commit hash

Brute-forcing the commit hash and seeing the secret

On this occasion, we needed to brute-force the commit hash up to its first 4 characters. Your mileage may vary.

We have now recovered the secret credentials, from a repo they were never committed to in the first place. We have also shown that even deleting the fork doesn’t prevent the data leak.

Summary

After going to a great deal of effort in replica-building a model Github, we have seen up close what is happening ‘under the bonnet’ when we create a pull request. We have also examined how forks work server-side, and how the representation of a pull request requires Cross-Fork Object References, as they are now called, to be saved in a repo. We have also demonstrated how a brute force attack against a repository can reveal commits made in error against a fork, even if the fork is later deleted.

Truffle Security’s position is that this is a bug. Github say it is a feature. Both are right.

It would not be satisfactory from a product perspective if a rejected pull request simply disappeared, if the contributor deleted their fork. Forks made against open source projects are often temporary; many don’t live much longer than a rejected pull request. Having the code changes under discussion get deleted along with the fork would deprive us of traceability, or a record of discussions.

That said, for many Github users, this is an unexpected consequence. The information leakage is potentially a problem, and nobody wants things they have removed to be indelible. CFOR’s are now one more trick in the Github hacker’s recon arsenal. It seems likely we will be seeing more of them in the places where these things turn up.