简体   繁体   English

删除git历史记录中的前x个提交,并删除其余历史记录中的所有合并分支

[英]delete first x commits in git history and remove all merge branches from the rest of the history

I have a git projects history on which I have close to 400 commits. 我有一个git项目的历史记录,据我有近400次提交。 I want to remove the first(earliest) 200 commits. 我想删除第一个(最早的)200次提交。 Then in the remaining 200 commits , I want to just target delete all the merge commits and keep the rest in order. 然后在其余的200次提交中,我只想删除所有合并提交并保持其余顺序。

After that is done I want to go through all the remaining commits and change one specific author email. 完成之后,我要检查所有其余的提交并更改一封特定的作者电子邮件。

Is there a way to do this gracefully ? 有没有办法优雅地做到这一点?

As several people already said, this is rarely a good idea, for several reasons that I won't repeat. 正如几个人已经说过的那样,这很少是一个好主意,原因有几个,我不再赘述。 I want to add one more thing, though, and then show how you can do this with git filter-branch . 我想再添加一件事,然后展示如何使用git filter-branch来做到这一点。

It's not a delete, it's a new copy: essentially, a new repo 这不是删除,而是新副本:本质上是新的仓库

The critical thing to know about this is that you cannot remove commits from the front or middle of a series of commits. 了解这一点的关键是,您不能从一系列提交的开头或中间删除提交。 The reason is simple: each commit records, as part of its identity, the identity of its parent commit(s). 原因很简单:每个提交都将其父提交的身份记录为身份的一部分。 The technical term for this is that the graph of commits forms a Merkle Tree . 术语的技术术语是提交图形成Merkle树

More concretely, the identity—the "true name", if you will—of a commit is its SHA-1. 更具体地讲,提交的身份(如果您愿意的话,为“真实名称”)是其SHA-1。 The SHA-1 is a cryptographic 1 hash of the data within the commit. SHA-1是提交中数据的加密1哈希。 One of the pieces of data is the parent line. 数据之一是parent行。 Here's an actual commit within the git source itself (minus @ signs to foil spam email harvesting): 这是git来源本身的实际提交内容(减号@可以阻止垃圾邮件的收集):

tree 55c0d854767f92185f0399ec0b72062374f9ff12
parent 8413a79e67177d026d2d8e1ac66451b80bb25d62
author Junio C Hamano <gitster pobox.com> 1436563740 -0700
committer Junio C Hamano <gitster pobox.com> 1436563740 -0700

The last minute bits of fixes

Signed-off-by: Junio C Hamano <gitster pobox.com>

If you were to try to delete a parent commit, anywhere within the chain, you'd get a new, different hash number for the child commit. 如果要尝试删除链中任何位置的父提交,则将为子提交获得一个新的,不同的哈希号。 This means that all its children need to change as well, to incorporate the new SHA-1s, all down the chain. 这意味着,所有孩子都需要改变为好,以纳入新的SHA-1,全部环比下滑。

What this means to you is that to get anything, including git filter-branch , to seem to delete some commits, you must copy every commit-to-keep to a new commit that has a new, different-ID commit (that has the same tree and message and so on as before, but a different parent line). 这对您来说意味着要获取任何内容,包括git filter-branch似乎要删除一些提交,则必须每个要保留的提交复制具有新的,不同ID的提交(具有与之前相同的消息等,但parent行不同)。 2 2

In essence, the result of doing git filter-branch is to make a new copy of the repository, with at least some, and maybe entirely, new and different commits in it. 从本质上讲,执行git filter-branch的结果是创建存储库的新副本 ,其中至少包含一些(也许完全是)新的和不同的提交。 This in turn means that anyone else working with the old repository has to discard their old repository and switch to the new one. 反过来,这意味着使用旧存储库的其他任何人都必须丢弃其旧存储库并切换到新存储库。

git filter-branch git filter-branch

While git filter-branch has a lot of options, its core job boils down to this. 尽管git filter-branch有很多选项,但其核心工作归结为这一点。 For each commit: 3 每次提交: 3

  • expand the commit's source tree 扩展提交的源代码树
  • get the author and committer (name, email, and time stamps) 获取作者和提交者(姓名,电子邮件和时间戳)
  • apply all the filters: 应用所有过滤器:
    • make any necessary changes to the tree 对树进行必要的更改
    • make any necessary changes to author and committer 对作者和提交者进行必要的更改
    • keep or skip this particular commit: if keeping this commit, make a new commit from what's left 保留或跳过此特定提交:如果保留此提交,请从剩余内容中进行一次新提交
  • add an entry to the mapping file, "original SHA-1" to "new SHA-1" 向映射文件“原始SHA-1”添加项到“新SHA-1”

The bullet-pointed list here is the "copy" step, after which there's one last task, "update references". 项目符号指向的列表是“复制”步骤,此后还有最后一项任务,即“更新引用”。 To understand this part properly, you need to know how git's references work, but in short, branch names (and if you add a --tag-filter , tag names as wee) are checked to see if theypointed to an old commit that got rewritten. 为了正确地理解这一部分,您需要知道git的引用是如何工作的,但是总之,要检查分支名称(如果您添加--tag-filter ,标记名称为wee),以查看它们是否指向了旧的提交,重写。 If so, they are changed to point to the new copy, or to the nearest new-copy commit in the case of commits skipped, 如果是这样,它们将更改为指向新副本,或者在跳过提交的情况下指向最近的new-copy提交,

To achieve what you want, you need to write a commit filter that uses the skip_commit function to omit the commits you want deleted (the first 200 and the merges), and uses git commit-tree on the rest. 为了实现skip_commit功能,您需要编写一个提交过滤器,该过滤器使用skip_commit函数忽略要删除的提交(前200个和合并),其余使用git commit-tree See the git filter-branch documentation for more details. 有关更多详细信息,请参见git filter-branch文档

(One reason git filter-branch has so many options is that expanding and re-compressing entire source trees is very slow. The script attempts to avoid this, and if all your filters can be done within the index and commit-graph—without expanding out the source trees—the filter completes much more quickly.) git filter-branch有这么多选项的一个原因是,扩展和重新压缩整个源树非常慢。脚本试图避免这种情况,并且如果所有的过滤器都可以在索引和提交图中完成,则无需扩展删除源树-过滤器的完成速度要快得多。)

Example implementation based on a new commit root: 基于新提交根的示例实现:

The code below will create a new repo consisting of only all commits below the specified new STARTCOMMIT. 下面的代码将创建一个仅由指定新STARTCOMMIT以下的所有提交组成的新存储库。 Branches and tags are kept. 分支和标签被保留。

export STARTCOMMIT=.....

git filter-branch --tag-name-filter cat \
   --commit-filter '
     git merge-base --is-ancestor ${STARTCOMMIT} ${GIT_COMMIT};
     if [ $? -eq 1 ]; 
     then
        skip_commit "$@";
     else
        git commit-tree "$@";
     fi' \
   -- --all

# remove original references
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
# reduce repo size
git reflog expire --expire=now --all && git gc --aggressive --prune=all

1 The implication of the "cryptographic" adjective is that you can't simply make a slight change to the commit, eg, adding text to the message, to produce the same old SHA-1 that you had before. 1 “密码”形容词的含义是,您不能简单地对提交进行轻微更改,例如,在消息中添加文本,以产生与以前相同的旧SHA-1。 The only way to do that in a computationally-feasible time is to break the encryption. 在计算上可行的时间内完成此操作的唯一方法是破坏加密。

2 In less-intensive-change cases, if you make an exact copy of an original commit, you wind up with the same SHA-1 you had before. 2在变更较少的情况下,如果您精确复制原始提交,则将使用以前的SHA-1。 For instance, if you have a filter-branch operation that deletes the second-to-tip-most commit in a chain, only the tip-most commit gets a new SHA-1. 例如,如果您有一个筛选分支操作,该操作删除了链中第二至最尖端的提交,则只有最尖端的提交才获得新的SHA-1。 In this particular case, though, we're proposing to delete the root commit, which necessarily renumbers every subsequent commit. 不过,在这种特殊情况下,我们建议删除根提交,该根提交必定会为每个后续提交重新编号。

3 The commits to be copied are obtained from the gitrevisions -style arguments you supply as part of the filter-branch operation. 3要复制的提交是从您在filter-branch操作中提供的gitrevisions -style参数获得的。 The branch names to rewrite are also taken from here, using the "positive references". 还可以使用“正引用”从此处获取要重写的分支名称。

First please think twice, if you really want to do this. 首先,如果您确实想这样做,请三思。 (Changing history, especially on a public repository, is usually a bad idea.) (更改历史记录,尤其是在公共存储库上更改记录,通常是个坏主意。)

You can use git rebase -i to do so. 您可以使用git rebase -i这样做。 There you can use fixup to combine two commits into one, you can use edit to change a commit. 在那里,您可以使用fixup将两个提交合并为一个,可以使用edit更改提交。 (including change of author.) (包括作者变更。)

For automated changes on multiple commits you can use git filter-branch . 对于多个提交的自动更改,可以使用git filter-branch But only use this if you know what you are doing. 但是,只有在知道自己在做什么的情况下,才使用此功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM