简体   繁体   中英

How to split a git repository into recent and older commits at a specific commit point, preserving branches?

How can I split my git repository into two repositories (recent and history) at a specific commit SHA while also preserving the branches in each, properly linked to their commits on master?

Problem description

While many SO questions ask and answer how to split off a subdirectory (eg, The Easy Way ), that is not what I need to do. Rather, I need to split the repository commits to all of those before a certain commit, and all of those that follow. While my repository is large with thousands of commits and hundreds of branches over a ten year history, the problem can be boiled down to a simple repository with 8 commits (1-8) and three branches (master, A, B):

1 - 2 - 3 - 4 - 5 - 6 - master
     \           \
      7           8
       \           \
        A           B

After conversion, what I want is two repositories. The first (project-history) should contain historical commits 1, 2, 3, and 4 and the associated commit 7 on branch A. The second (project-recent) should contain commits 4, 5, 6 and associated commit 8 on branch B. These would look like:

project-history                     project-recent
1 - 2 - 3 - 4 -master               4 - 5 - 6 - master
     \                                   \
      7                                   8
       \                                   \
        A                                   B

There is a similar problem described in Split a Git Repository into Two , but neither of the answers are accepted, and neither produce the results I need, which I describe below, along with a test script.

Possible Approach: Branch, then rebase using an orphaned commit

The Pro Git book Chapter 7.13 Git-Tools-Replace provides an approach that comes very close. In that approach, you first create the history, and then rebase the recent commits onto a new orphan commit.

Create the history

  1. find the SHA of the commit on which the repository is to be split
  2. create a history branch at that point
  3. push the history branch and its attached branches into a new project-history repo

This all works great.

Rebase the recent commits

But this next part doesn't work fully:

  1. Create an orphan commit, which produces commit aaf5c36
    • git commit-tree 8e3dbc5^{tree}
  2. Rebase the master onto aaf5c36 starting at the parent of the split commit
    • git rebase --preserve-merges --onto aaf5c36 8e3dbc5
  3. Push this new master and branch B into a new project-recent repo

The problem: Branch B is disconnected from master in the new project-recent repository. The resulting repositories look like:

project-history                     project-recent
1 - 2 - 3 - 4 -master               4 - 5 - 6 - master
     \                                   
      7                             1 - 2 - 3 - 4 - 5 - 8- B
       \                            
        A                                   

Script to illustrate the issue

The repo-split-example.sh script creates an example repository ( repo-split-example ), then splits it using this technique into repo-split-history and repo-split-recent repositories, but the branch B is unattached in the latter. In addition, by pushing the branch B into the recent repository, the historical commits are also pushed into the repository (commits 1,2,3), and there are duplicates of commits 4 and 5 (the originals, plus the rewritten ones from the rebase). Here's the final state of the project-recent repo:

$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history
* 7a98d11 (B) branchB
* 1f620ac fifth
* 1853778 fourth
* 14ab901 third
* 8dd0189 second
* bb1fc8d first

Whereas what I want is:

$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
| * 7a98d11 (B) branchB
|/
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history

The repo-split-example.sh script is an easy way to reproduce the problem. How can I get the project-recent repository to contain the recent commits from master plus the commits from branch B, properly linked to rebased commit 5 ( fifth )?

Thanks for the advice!

Update

After looking around more, I determined that I can manually rebase the recent branches back into the newly rewritten tree. To do this, for each branch in the recent tree, I would do:

# Rebase branch B onto the newly rewritten fifth commit
git branch temp e8545fd # the SHA of the rewritten fifth commit
git checkout B
git rebase temp # This works, but will (always?) have conflicts because it starts 
                # from the beginning because there is no common merge base for the commit
git branch -d temp

So, this works, and produces the desired result. Bit the git rebase temp produces a large number of merge conflicts (one for every commit since the beginning of the history), because the rewritten fifth commit does not share any history with the original branch B. So there's a lot of manual conflict resolution in here, and it would just take too long for my real repository. So still looking for a workable solution where the rebase works without merge conflicts.

I finally figured this out, so documenting the procedure here in hope that it is useful. Instead of rebasing, one can use a graft to split the repository, and then use filter-branch to rewrite the tree to make the graft permanent. So, given that TRUNCPOINT is the SHA of the commit on which to split the repository, TRUNCPARENT is its parent's SHA, and both project-history and project-recent are newly initialized repositories ready to receive the historical commits and recent commits, the final procedure to split the repository into two halves was as follows:

First create a branch for the historical commits

This is simply done by creating a branch at $TRUNCPOINT, and pushing that branch and all of the branches that stem from it to project-history .

git branch history $TRUNCPOINT
git push project-history history:master
git push project-history A

That pushes the historical commits on the local hisotry branch to the master branch of the project-history repo, and then pushes branch A to the project-history repository as well. Here's how it looks as a result:

git log --graph --oneline --decorate --all
* fdc8f84 (A) branchA a1
| * 7237a3e (HEAD -> master) fourth
| * 55be55d third
|/  
* 26555d8 second
* 5a68ca2 first

Good so far, as the most recent commit in the history is the fourth commit.

Now we need to to split the repo to get the recent commits from TRUNCPOINT to HEAD of master.

Create a base commit to serve as the parent for the recent commits

These next commands create an empty commit that is going to become the new root of the recent commit tree.

MESSAGE="Get history from historical repository"
BASECOMMIT=`echo $MESSAGE | git commit-tree ${TRUNCPARENT}^{tree}`

Split the repository by grafting the TRUNCPARENT onto BASECOMMIT

Finally, we graft the repository, telling it that the parent of $TRUNCPOINT is now $BASECOMMIT rather than its original parent. This effectively truncates the history at $TRUNCPOINT. Then we use filter-branch to rewrite the history to make the graft permanent, and then push master and its associated branch B to the project-recent repository.

echo "${TRUNCPOINT} ${BASECOMMIT}" > .git/info/grafts
git filter-branch -- --all
git push project-recent master
git push project-recent B

Here's the resulting split contents for the project-recent repo.

git log --graph --oneline --decorate --all
* 2335aeb (B) branchB b2
* 2bb7ea3 branchB b1
| * 83c3ae9 (HEAD -> master) sixth
|/  
* 25931c5 fifth
* 1e1e201 fourth
* a7f3373 Get history from historical repository

Note that the root commit a7f3373 is the BASECOMMIT that we artificially created, and the commit log for it can contain a message that points the user to the location of the repository with the project history, allowing future users to rejoin the two repositories using git replace if so desired. The full process as a reproducible script can be downloaded, but is also included below for reference.

The only other major issue we have is in trying to determine, in our real-world case, which branches should be pushed to the historical repo and which should be pushed to the recent repo. But this answer shows how the split itself was completed to create two repositories.

Fully reproduced example bash script

#!/bin/bash
WORKDIR=${PWD}

create_repos () {
    rm -rf repo-split-example repo-split-recent repo-split-history
    # Create the repo to be split
    example_repo

    # Create the repo to contain the historical commits
    HISTREPO="file://${WORKDIR}/repo-split-history"
    mkdir ../repo-split-history
    cd ../repo-split-history/
    git init --bare
    cd ../repo-split-example
    git remote add project-history $HISTREPO

    # Create the repo to contain the recent commits
    RECEREPO="file://${WORKDIR}/repo-split-recent"
    mkdir ../repo-split-recent
    cd ../repo-split-recent/
    git init --bare
    cd ../repo-split-example
    git remote add project-recent $RECEREPO
}

example_repo () {
    # Part I: set up a test repo with our example commits
    mkdir repo-split-example
    cd repo-split-example/
    git init
    echo "We want to split the repository into project-recent and project-history portions, following the instructions at https://git-scm.com/book/en/v2/Git-Tools-Replace., but also including branches." > README.md
    echo " "
    echo "First commit." >> README.md
    git add README.md
    git commit -m "first"
    echo "Second commit." >> README.md
    git add README.md
    git commit -m "second"

    git checkout -b A HEAD
    echo "Add Branch A change." >> README.md
    git add README.md
    git commit -m "branchA a1"

    git checkout master
    echo "Third commit." >> README.md
    git add README.md
    git commit -m "third"
    TRUNCPARENT=`git rev-parse HEAD`

    echo "Fourth commit." >> README.md 
    git add README.md
    git commit -m "fourth"
    TRUNCPOINT=`git rev-parse HEAD`

    echo "Fifth commit." >> README.md
    git add README.md
    git commit -m "fifth"
    FIFTH=`git rev-parse HEAD`

    git checkout -b B HEAD
    echo "Add Branch B change. b1" >> README.md
    git add README.md
    git commit -m "branchB b1"
    B1=`git rev-parse HEAD`

    echo "Add Branch B change. b2" >> README.md
    git add README.md
    git commit -m "branchB b2"
    B2=`git rev-parse HEAD`

    git checkout master
    echo "Sixth commit." >> README.md
    git add README.md
    git commit -m "sixth"

    # Now we have a repo with the requisite structure, ready to be split
    git log --graph --all --oneline --decorate
}


split_repo () {
    # Part II: Split the git repo into historical and current halves at $TRUNCPOINT
    # Following guidelines at https://git-scm.com/book/en/v2/Git-Tools-Replace

    # First create a branch for the historical commits
    echo "Branching history at $TRUNCPOINT"
    git branch history $TRUNCPOINT
    git log --graph --oneline --decorate history A

    # Now copy the history repo to the remote HISTREPO repository
    git push project-history history:master
    git push project-history A

    # Now to split the repo to get the recent history from TRUNCPOINT to HEAD of master
    # Create a base commit for the new new recent history
    MESSAGE="Get history from historical repository at $HISTREPO"
    BASECOMMIT=`echo $MESSAGE | git commit-tree ${TRUNCPARENT}^{tree}`

    # Split the repository by grafting the TRUNCPARENT onto BASECOMMIT
    echo "${TRUNCPOINT} ${BASECOMMIT}" > .git/info/grafts
    git filter-branch -- --all

    # Finally, push the current rewritten master and associated branches to a new repository
    git push project-recent master
    git push project-recent B
}

create_repos
split_repo 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM