简体   繁体   中英

Git - Squash All Commits in History Before Specific Commit

I have a Mercurial repo that I am converting to Git. The commit history is quite large and I do not need all of the commit history in the new repo. Once I convert the commit history to Git (and before pushing to the new repo), I want to squash all the commits before a certain tag into one commit.

So, if I have:

commit 6
commit 5
commit 4
commit 3
commit 2
commit 1 -- First commit ever

I want to end up with:

commit 6
commit 5
commit X -- squashed 1, 2, 3, 4

Note: There are thousands of commits that I need to squash. So, manually picking/marking them one by one is not an option.

The other answers so far suggest rebase. This can work, in some cases, depending on the commit graph in the converted-to-Git repository. The new fancier rebase with --rebase-merges can definitely do it. But it's kind of a clumsy way to go about it. The ideal way to do this is to convert commits starting at the first one you want to keep. That is, have your Mercurial exporter export to Git, as Git's first commit, the revision you want to pretend is the root. Have the Mercurial exporter go on to export that commit's descendants, one at a time into the importer, in the same way that the exporter was always going to do this job (whatever way that may be).

Whether and how you can do this depends on what tool(s) you are using to convert. (I have not actually done any of these conversions, but most people seem to use hg-fast-export and git fast-import . I have not looked much at the inner details of hg-fast-export but there's no obvious reason it couldn't do this.)


Fundamentally (internally), Mercurial stores commits as changesets. This is not the case for Git: Git stores snapshots instead. However, Mercurial checks out (ie, extracts) snapshots, by summing together changesets as required, so if your tool works by doing hg checkout (or the internal equivalent thereof), there is no issue here in the first place: you just avoid checking out revisions prior to the first snapshot you want, and import those into Git, and the resulting Git history will begin at the desired point.


If the tools you have make this inconvenient, though, note that after converting the entire repository history, including all branches and merges, into Git snapshots, your Git repository makes this relatively easy as a second pass. Your Git history might, eg, look like this:

          o-..-o            o--o   <-- br1
         /      \          /
...--o--o--....--o--*--o--o--o--o   <-- br2
      \         /             \
       o--...--o               o   <-- master

where commit * is the first commit you wanted to see in your Git repository. (Note that if there are multiple histories going back before * , you have a different issue and cannot do this kind of transformation in the first place without additional history-modification. But as long as * is on a sort of choke point , as it is in this diagram, it's easy to snip the graph here.)

To remove everything before * , simply use git replace to make an alternative commit that's very much like commit * , but has no parent:

git replace --graft <hash-of-*>

You now have a replacement that most of Git will use instead of * , that has no parent commit. Then run git filter-branch over all branches and tags, with the no-op filter:

git filter-branch --tag-name-filter cat -- --all

Or, once git filter-repo is included with Git (or if you've installed it):

git filter-repo --force

(be careful with the --force option when using filter-repo : this makes it destroy the old history in this repository, but in this csae, that's what we want).

This will copy every reachable commit, including the substitute * but excluding * and its own history, to new commits, then update your branch and tag names.

If using filter-branch, remove the refs/originals/ name-space (see the git filter-branch documentation for details), force early scavenging of the original objects if you like (the extra commits will eventually fall away on their own), and you're done.

To do all of those precisely, Steps will be

  1. Checkout to the specific commit
  2. Squash everything before it to this particular commit
  3. Cherry-pick the commits that happened after this
  4. Delete your existing branch
  5. Save your recently cooked head into the same branch name

function git_squash_from() {
    COMMIT_TO_SQUASH=$1
    SQUASH_MESSAGE=$2

    STARTING_BRANCH=$(git rev-parse --abbrev-ref HEAD) # This will be overwritten
    CURRENT_HEAD=$(git rev-parse HEAD)

    echo From $CURRENT_HEAD to the successor of  $COMMIT_TO_SQUASH will retain, from $COMMIT_TO_SQUASH to beginging will be squashed

    git checkout $COMMIT_TO_SQUASH
    git reset $(git commit-tree HEAD^{tree} -m "$SQUASH_MESSAGE")
    git cherry-pick $CURRENT_HEAD...$COMMIT_TO_SQUASH
    git branch -D $STARTING_BRANCH
    git checkout -b $STARTING_BRANCH    
}

git_squash_from 87ef7fa "Squash ... "

You can extend it further to build the SQUASH_MESSAGE from all commit messages.

Suppose the original branch is master , and the new branch is new .

git checkout --orphan new commit4
git commit -m "squash commits"
git branch tmp master
git rebase commit4 tmp --onto new
git checkout new
git merge tmp
git branch -D tmp

The option "-p" is needed in "git rebase" if you want to keep the merge commits.

While git reset --soft could be an option for squashing one set of commits ( as in here ), I would recommend, for multiple set of commits:

  • having one original Git repo
  • doing patches between two tags (if you can go from one tag to the next),
  • applying each patch to a new Git repo where you store those squashed commits as one patch after the other.

Note this applies to the first commit, through the git rebase --root option .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM