git filter-repo: A git filter-branch replacement for simpletons like me

Also if you’re not a simpleton it has the benefit of being significantly faster, but I digress.

tl;dr: git filter-repo --replace-text <(echo "filter-branch=⇒filter-repo")

I had the enviable task of migrating some code out from a monolithic repository recently.

I say enviable with emphasis because I’ve done this in the past using git filter-branch and it has always been a slog.
Partly because I found the tool difficult to understand with its plethora of options that always felt like a trip down the rabbit hole, but mostly because I would look at the resulting repository and realise that I had forgot something and it would take an entire eternity for filter-branch to run again.

I ran the following series of commands against the Tomcat repository if you would like to play along.

To only keep the java and test folders and discard the others we execute

$> git filter-repo \
            --path java \
            --path test (1)
1--path can be specified multiple times

All repositories need a foo right? Lets add our own with the following set of commands

$> git filter-repo \
            --path-rename test/:foo/ \
            --path-rename java/:foo/ (1)
1Note how here we are combining the test and java directories under foo.
Care should be taken whilst doing this as if the same file exists in both directories, then the one from the test directory will be overridden with the file from the java directory.

And now in a flight of fancy I no longer want jars in my repo

$> git filter-repo \
            --path-glob "*.jar" \
            --invert-paths

And as final trick we’re going to replace all occurences of apache with BSD.

$> git filter-repo \
            --replace-text <(echo "apache==>ehcapa") (1)
1This should really be read in from a file, but this is much more illustrative.

Pithy one-liners aside there are some solid reasons to use filter-repo over filter-branch

  • It’s fast - In a trivial example I found filter-repo to take about 20% of the time of filter-branch

  • Auto cleanup - re-packing and cleanup is done automatically as part of the command execution

  • It’s designed to be simple for simpletons and extensible for experts

  • Filter-branch is deprecated