简体   繁体   中英

delete first x commits in git history and remove all merge branches from the rest of the history

I have a git projects history on which I have close to 400 commits. I want to remove the first(earliest) 200 commits. Then in the remaining 200 commits , I want to just target delete all the merge commits and keep the rest in order.

After that is done I want to go through all the remaining commits and change one specific author email.

Is there a way to do this gracefully ?

As several people already said, this is rarely a good idea, for several reasons that I won't repeat. I want to add one more thing, though, and then show how you can do this with git filter-branch .

It's not a delete, it's a new copy: essentially, a new repo

The critical thing to know about this is that you cannot remove commits from the front or middle of a series of commits. The reason is simple: each commit records, as part of its identity, the identity of its parent commit(s). The technical term for this is that the graph of commits forms a Merkle Tree .

More concretely, the identity—the "true name", if you will—of a commit is its SHA-1. The SHA-1 is a cryptographic 1 hash of the data within the commit. One of the pieces of data is the parent line. Here's an actual commit within the git source itself (minus @ signs to foil spam email harvesting):

tree 55c0d854767f92185f0399ec0b72062374f9ff12
parent 8413a79e67177d026d2d8e1ac66451b80bb25d62
author Junio C Hamano <gitster pobox.com> 1436563740 -0700
committer Junio C Hamano <gitster pobox.com> 1436563740 -0700

The last minute bits of fixes

Signed-off-by: Junio C Hamano <gitster pobox.com>

If you were to try to delete a parent commit, anywhere within the chain, you'd get a new, different hash number for the child commit. This means that all its children need to change as well, to incorporate the new SHA-1s, all down the chain.

What this means to you is that to get anything, including git filter-branch , to seem to delete some commits, you must copy every commit-to-keep to a new commit that has a new, different-ID commit (that has the same tree and message and so on as before, but a different parent line). 2

In essence, the result of doing git filter-branch is to make a new copy of the repository, with at least some, and maybe entirely, new and different commits in it. This in turn means that anyone else working with the old repository has to discard their old repository and switch to the new one.

git filter-branch

While git filter-branch has a lot of options, its core job boils down to this. For each commit: 3

  • expand the commit's source tree
  • get the author and committer (name, email, and time stamps)
  • apply all the filters:
    • make any necessary changes to the tree
    • make any necessary changes to author and committer
    • keep or skip this particular commit: if keeping this commit, make a new commit from what's left
  • add an entry to the mapping file, "original SHA-1" to "new SHA-1"

The bullet-pointed list here is the "copy" step, after which there's one last task, "update references". To understand this part properly, you need to know how git's references work, but in short, branch names (and if you add a --tag-filter , tag names as wee) are checked to see if theypointed to an old commit that got rewritten. If so, they are changed to point to the new copy, or to the nearest new-copy commit in the case of commits skipped,

To achieve what you want, you need to write a commit filter that uses the skip_commit function to omit the commits you want deleted (the first 200 and the merges), and uses git commit-tree on the rest. See the git filter-branch documentation for more details.

(One reason git filter-branch has so many options is that expanding and re-compressing entire source trees is very slow. The script attempts to avoid this, and if all your filters can be done within the index and commit-graph—without expanding out the source trees—the filter completes much more quickly.)

Example implementation based on a new commit root:

The code below will create a new repo consisting of only all commits below the specified new STARTCOMMIT. Branches and tags are kept.

export STARTCOMMIT=.....

git filter-branch --tag-name-filter cat \
   --commit-filter '
     git merge-base --is-ancestor ${STARTCOMMIT} ${GIT_COMMIT};
     if [ $? -eq 1 ]; 
     then
        skip_commit "$@";
     else
        git commit-tree "$@";
     fi' \
   -- --all

# remove original references
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
# reduce repo size
git reflog expire --expire=now --all && git gc --aggressive --prune=all

1 The implication of the "cryptographic" adjective is that you can't simply make a slight change to the commit, eg, adding text to the message, to produce the same old SHA-1 that you had before. The only way to do that in a computationally-feasible time is to break the encryption.

2 In less-intensive-change cases, if you make an exact copy of an original commit, you wind up with the same SHA-1 you had before. For instance, if you have a filter-branch operation that deletes the second-to-tip-most commit in a chain, only the tip-most commit gets a new SHA-1. In this particular case, though, we're proposing to delete the root commit, which necessarily renumbers every subsequent commit.

3 The commits to be copied are obtained from the gitrevisions -style arguments you supply as part of the filter-branch operation. The branch names to rewrite are also taken from here, using the "positive references".

First please think twice, if you really want to do this. (Changing history, especially on a public repository, is usually a bad idea.)

You can use git rebase -i to do so. There you can use fixup to combine two commits into one, you can use edit to change a commit. (including change of author.)

For automated changes on multiple commits you can use git filter-branch . But only use this if you know what you are doing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM