Git: Splitting and Merging Repositories
Projects evolve, code boundaries change. Git commit history shall be preserved.
Split Repo
Split files out into a new repo preserving all history. Existing repository: perl. File to preserve: psntrophy.pl.
First, clone the existing repository to a new location, as it will be carved down to just the file(s) we need, thereby becoming the new desired repository:
cd tmp git clone ~/workspace/perl cd perl git remote remove origin
Remove everything but the matching file by use of filter-branch. There are other approaches, particularly for subdirectories, but I found this to work best (all one line):
git filter-branch -f --prune-empty --index-filter 'git ls-tree -z --name-only --full-tree HEAD | grep -zv "^psntrophy.pl$" | xargs -0 git rm --cached -r -f --ignore-unmatch' -- --all
Rebase to eliminate commits related to renames that are not related to this file, and only remain because git ls-tree did not find these non-existent files for our filter:
git rebase --onto bf1fbad2b6f766b80c2a07edacd7084c6d3fd252 e490b6cb41ca541b7e468148a62f63435366836f git rebase --onto 31fe475fe0c7314a2f307fa1c201cd2f9eca6992 bf1fbad2b6f766b80c2a07edacd7084c6d3fd252
Garbage collection on 694 objects at 4MB:
git gc Counting objects: 694, done. Delta compression using up to 2 threads. Compressing objects: 100% (674/674), done. Writing objects: 100% (694/694), done. Total 694 (delta 253), reused 0 (delta 0) du -csh .git/objects/ 592K .git/objects/ 592K total
Merge Repos
Merge several repositories, each into subdirectories of a new repository, preserving all history. New repository: unix. Project to merge: ccli.
First, clone the project we wish to merge in so as to preserve the original in case something goes awry:
cd tmp git clone ~/workspace/ccli cd ccli git remote remove origin cd ..
Now back at top-level tmp, create a new target repository and start with a an initial dummy commit
git init unix cd unix echo "initial" > deleteme.txt git add deleteme.txt git commit -m "Initial commit." git rm ./deleteme.txt git commit -m "Initial delete."
Add a remote to the ccli repo:
git remote add ccli ../ccli git fetch ccli git merge ccli/master
Move files and commit:
mkdir ccli for i in $(ls -1 | grep -v ccli); do git mv $i ccli; done git mv .ccli_env_example ccli git commit -m "Move ccli project into ccli subdir" git remote remove ccli
Repeat for each project to merge in.
NOTE: Alternative is to fetch each project into separate branches, merge the branches. Cleaner, but end result same.
NOTE: Requires –follow flag to track full file history in git log, as this is a move like any other. There may be a history rewrite alternative.
NOTE: Approaches with submodules keep subdir distinct so future changes from upstream can be brought in. This is not what we want, as we wish to glue things together permanently.
Upload
Should we desire to share our work by pushing it to a new remote repository, such as GitHub:
git remote add origin https://github.com/fritzhardy/psnextract.git git push -u origin master
Resources
- http://stackoverflow.com/questions/2797191/how-to-split-a-git-repository-while-preserving-subdirectories
- http://stackoverflow.com/questions/37219/how-do-you-remove-a-specific-revision-in-the-git-history (rebase notes in responses)
- http://saintgimp.org/2013/01/22/merging-two-git-repositories-into-one-repository-without-losing-file-history
- http://julipedia.meroh.net/2014/02/how-to-merge-multiple-git-repositories.html
- https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line
[…] work has had me splitting and merging git repositories using git filter-branch to carve down a cloned repo to just the desired file(s). An alternative is […]