How can I split my git repository into two repositories (recent and history) at a specific commit SHA while also preserving the branches in each, properly linked to their commits on master?
While many SO questions ask and answer how to split off a subdirectory (eg, The Easy Way ), that is not what I need to do. Rather, I need to split the repository commits to all of those before a certain commit, and all of those that follow. While my repository is large with thousands of commits and hundreds of branches over a ten year history, the problem can be boiled down to a simple repository with 8 commits (1-8) and three branches (master, A, B):
1 - 2 - 3 - 4 - 5 - 6 - master
\ \
7 8
\ \
A B
After conversion, what I want is two repositories. The first (project-history) should contain historical commits 1, 2, 3, and 4 and the associated commit 7 on branch A. The second (project-recent) should contain commits 4, 5, 6 and associated commit 8 on branch B. These would look like:
project-history project-recent
1 - 2 - 3 - 4 -master 4 - 5 - 6 - master
\ \
7 8
\ \
A B
There is a similar problem described in Split a Git Repository into Two , but neither of the answers are accepted, and neither produce the results I need, which I describe below, along with a test script.
The Pro Git book Chapter 7.13 Git-Tools-Replace provides an approach that comes very close. In that approach, you first create the history, and then rebase the recent commits onto a new orphan commit.
history
branch at that point This all works great.
But this next part doesn't work fully:
aaf5c36
git commit-tree 8e3dbc5^{tree}
aaf5c36
starting at the parent of the split commit
git rebase --preserve-merges --onto aaf5c36 8e3dbc5
The problem: Branch B is disconnected from master in the new project-recent repository. The resulting repositories look like:
project-history project-recent
1 - 2 - 3 - 4 -master 4 - 5 - 6 - master
\
7 1 - 2 - 3 - 4 - 5 - 8- B
\
A
The repo-split-example.sh script creates an example repository ( repo-split-example
), then splits it using this technique into repo-split-history
and repo-split-recent
repositories, but the branch B is unattached in the latter. In addition, by pushing the branch B into the recent repository, the historical commits are also pushed into the repository (commits 1,2,3), and there are duplicates of commits 4 and 5 (the originals, plus the rewritten ones from the rebase). Here's the final state of the project-recent repo:
$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history
* 7a98d11 (B) branchB
* 1f620ac fifth
* 1853778 fourth
* 14ab901 third
* 8dd0189 second
* bb1fc8d first
Whereas what I want is:
$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
| * 7a98d11 (B) branchB
|/
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history
The repo-split-example.sh script is an easy way to reproduce the problem. How can I get the project-recent repository to contain the recent commits from master plus the commits from branch B, properly linked to rebased commit 5 ( fifth
)?
Thanks for the advice!
After looking around more, I determined that I can manually rebase the recent branches back into the newly rewritten tree. To do this, for each branch in the recent tree, I would do:
# Rebase branch B onto the newly rewritten fifth commit
git branch temp e8545fd # the SHA of the rewritten fifth commit
git checkout B
git rebase temp # This works, but will (always?) have conflicts because it starts
# from the beginning because there is no common merge base for the commit
git branch -d temp
So, this works, and produces the desired result. Bit the git rebase temp
produces a large number of merge conflicts (one for every commit since the beginning of the history), because the rewritten fifth commit does not share any history with the original branch B. So there's a lot of manual conflict resolution in here, and it would just take too long for my real repository. So still looking for a workable solution where the rebase works without merge conflicts.
I finally figured this out, so documenting the procedure here in hope that it is useful. Instead of rebasing, one can use a graft to split the repository, and then use filter-branch to rewrite the tree to make the graft permanent. So, given that TRUNCPOINT
is the SHA of the commit on which to split the repository, TRUNCPARENT
is its parent's SHA, and both project-history
and project-recent
are newly initialized repositories ready to receive the historical commits and recent commits, the final procedure to split the repository into two halves was as follows:
This is simply done by creating a branch at $TRUNCPOINT, and pushing that branch and all of the branches that stem from it to project-history
.
git branch history $TRUNCPOINT
git push project-history history:master
git push project-history A
That pushes the historical commits on the local hisotry
branch to the master
branch of the project-history
repo, and then pushes branch A to the project-history
repository as well. Here's how it looks as a result:
git log --graph --oneline --decorate --all
* fdc8f84 (A) branchA a1
| * 7237a3e (HEAD -> master) fourth
| * 55be55d third
|/
* 26555d8 second
* 5a68ca2 first
Good so far, as the most recent commit in the history is the fourth commit.
Now we need to to split the repo to get the recent commits from TRUNCPOINT to HEAD of master.
These next commands create an empty commit that is going to become the new root of the recent commit tree.
MESSAGE="Get history from historical repository"
BASECOMMIT=`echo $MESSAGE | git commit-tree ${TRUNCPARENT}^{tree}`
Finally, we graft the repository, telling it that the parent of $TRUNCPOINT is now $BASECOMMIT rather than its original parent. This effectively truncates the history at $TRUNCPOINT. Then we use filter-branch
to rewrite the history to make the graft permanent, and then push master and its associated branch B to the project-recent
repository.
echo "${TRUNCPOINT} ${BASECOMMIT}" > .git/info/grafts
git filter-branch -- --all
git push project-recent master
git push project-recent B
Here's the resulting split contents for the project-recent
repo.
git log --graph --oneline --decorate --all
* 2335aeb (B) branchB b2
* 2bb7ea3 branchB b1
| * 83c3ae9 (HEAD -> master) sixth
|/
* 25931c5 fifth
* 1e1e201 fourth
* a7f3373 Get history from historical repository
Note that the root commit a7f3373
is the BASECOMMIT that we artificially created, and the commit log for it can contain a message that points the user to the location of the repository with the project history, allowing future users to rejoin the two repositories using git replace
if so desired. The full process as a reproducible script can be downloaded, but is also included below for reference.
The only other major issue we have is in trying to determine, in our real-world case, which branches should be pushed to the historical repo and which should be pushed to the recent repo. But this answer shows how the split itself was completed to create two repositories.
#!/bin/bash
WORKDIR=${PWD}
create_repos () {
rm -rf repo-split-example repo-split-recent repo-split-history
# Create the repo to be split
example_repo
# Create the repo to contain the historical commits
HISTREPO="file://${WORKDIR}/repo-split-history"
mkdir ../repo-split-history
cd ../repo-split-history/
git init --bare
cd ../repo-split-example
git remote add project-history $HISTREPO
# Create the repo to contain the recent commits
RECEREPO="file://${WORKDIR}/repo-split-recent"
mkdir ../repo-split-recent
cd ../repo-split-recent/
git init --bare
cd ../repo-split-example
git remote add project-recent $RECEREPO
}
example_repo () {
# Part I: set up a test repo with our example commits
mkdir repo-split-example
cd repo-split-example/
git init
echo "We want to split the repository into project-recent and project-history portions, following the instructions at https://git-scm.com/book/en/v2/Git-Tools-Replace., but also including branches." > README.md
echo " "
echo "First commit." >> README.md
git add README.md
git commit -m "first"
echo "Second commit." >> README.md
git add README.md
git commit -m "second"
git checkout -b A HEAD
echo "Add Branch A change." >> README.md
git add README.md
git commit -m "branchA a1"
git checkout master
echo "Third commit." >> README.md
git add README.md
git commit -m "third"
TRUNCPARENT=`git rev-parse HEAD`
echo "Fourth commit." >> README.md
git add README.md
git commit -m "fourth"
TRUNCPOINT=`git rev-parse HEAD`
echo "Fifth commit." >> README.md
git add README.md
git commit -m "fifth"
FIFTH=`git rev-parse HEAD`
git checkout -b B HEAD
echo "Add Branch B change. b1" >> README.md
git add README.md
git commit -m "branchB b1"
B1=`git rev-parse HEAD`
echo "Add Branch B change. b2" >> README.md
git add README.md
git commit -m "branchB b2"
B2=`git rev-parse HEAD`
git checkout master
echo "Sixth commit." >> README.md
git add README.md
git commit -m "sixth"
# Now we have a repo with the requisite structure, ready to be split
git log --graph --all --oneline --decorate
}
split_repo () {
# Part II: Split the git repo into historical and current halves at $TRUNCPOINT
# Following guidelines at https://git-scm.com/book/en/v2/Git-Tools-Replace
# First create a branch for the historical commits
echo "Branching history at $TRUNCPOINT"
git branch history $TRUNCPOINT
git log --graph --oneline --decorate history A
# Now copy the history repo to the remote HISTREPO repository
git push project-history history:master
git push project-history A
# Now to split the repo to get the recent history from TRUNCPOINT to HEAD of master
# Create a base commit for the new new recent history
MESSAGE="Get history from historical repository at $HISTREPO"
BASECOMMIT=`echo $MESSAGE | git commit-tree ${TRUNCPARENT}^{tree}`
# Split the repository by grafting the TRUNCPARENT onto BASECOMMIT
echo "${TRUNCPOINT} ${BASECOMMIT}" > .git/info/grafts
git filter-branch -- --all
# Finally, push the current rewritten master and associated branches to a new repository
git push project-recent master
git push project-recent B
}
create_repos
split_repo
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.