简体   繁体   English

git filter-branch - 放弃对一系列提交中的一组文件的更改

[英]git filter-branch - discard the changes to a set of files in a range of commits

Say I have a branch dev and I want to discard all the changes made to a set of files in the rage of commits in dev branch since it diverged from master . 假设我有一个分支dev ,我想放弃dev分支的提交范围内对一组文件所做的所有更改因为它与master分开。 If a commit in this range only touches those files I'd liked it pruned. 如果此范围内的提交仅触及我喜欢的那些文件,则会将其修剪掉。 The closest I got was : 我得到的最接近的是:

git checkout dev
git filter-branch --force --tree-filter 'git checkout master -- \
a/b/c.png \
...
' --prune-empty -- master-dev-older-ancestor..HEAD

but this has these drawbacks 但这有这些缺点

  1. if the file was since deleted in master it will fail with error: pathspec 'a/b/c.png' did not match any file(s) known to git. 如果文件在master中被删除,它将失败并显示error: pathspec 'a/b/c.png' did not match any file(s) known to git. I might decide to git checkout master-dev-older-ancestor but then, 我可能决定git checkout master-dev-older-ancestor但是接着,
  2. this file may not exist in master-dev-older-ancestor, and was merged in from master back to dev at a later point 这个文件可能不存在于master-dev-older-ancestor中,并且在以后的某个时候从master返回到dev
  3. after all I may want to discard changes to some files that are nowhere to be seen in master 毕竟我可能想要放弃一些在master中无处可见的文件的更改

Fundamentally the point is that I do not want tell git to checkout a specific version of the file - I want to tell git to filter all commits in the range master-dev-older-ancestor..HEAD to have all changes in an arbitrary set of files (present anywhere on master or not ) discarded . 从根本上说,我不想告诉git签出文件的特定版本 - 我想告诉git过滤范围 master-dev-older-ancestor..HEAD 中的所有提交以便在任意集合中进行所有更改文件(呈现上的任何地方主与否丢弃

So how do I tell git ? 那我怎么告诉git?

Fundamentally, what filter-branch does is this—everything else is optimization and/or edge-cases: 1 从根本上说,filter-branch的作用是什么 - 其他一切都是优化和/或边缘情况: 1

  • For each commit in the listed revision(s): 对于列出的修订中的每个提交:
    1. check out that commit; 看看那个提交;
    2. apply the filter(s); 应用过滤器;
    3. create a new commit, which may or may not be the same as the old commit depending on step 2 (ie, this new copy is a modified version of the old one, unless it's bit-for-bit identical, in which case the "created new" commit is actually just the old commit after all). 根据步骤2创建一个新提交,它可能与旧提交相同或不同(即,这个新副本是旧副本的修改版本,除非它是逐位相同的,在这种情况下,“创建新的“提交实际上只是旧提交”。
  • For each "positive" ref on the command line, rewrite it to point to the new commit made in step 3 wherever it pointed to an old commit checked out in step 1. 对于命令行中的每个“正”引用,重写它以指向在步骤3中进行的新提交,无论它指向在步骤1中检出的旧提交。

Now let's consider your desired action, but I'm going to emphasize a different word: 现在让我们考虑你想要的行动,但我要强调一个不同的词:

filter all commits in [a] range ... to have all changes in an arbitrary set of files ... discarded 过滤[a]范围内的所有提交...以使任意文件集中的所有更改 ...被丢弃

I emphasize "changes" here because each commit is a complete, stand-alone entity. 我在此强调“更改”,因为每次提交都是一个完整的,独立的实体。 Commits don't have "changes", they just have files . 承诺没有 “改变”,他们只是有文件 The only way to see changes is to compare one specific commit against another specific commit: git diff commitA commitB for example. 查看更改的唯一方法是将一个特定提交与另一个特定提交进行比较:例如git diff commitA commitB

Thus, when you say "changes to some file(s)", the immediate obvious question should be: changes with respect to what? 因此,当你说“改变某些文件”时,显而易见的问题应该是:关于什么的改变?

In most cases, people who talk about "changes in a commit" mean "changes in this commit with respect to its immediate ancestor": for simple (non-merge) commits, the patch you'd get with git show or git log -p . 在大多数情况下,谈论“提交中的更改”的人意味着“此提交相对于其直接祖先的更改”:对于简单(非合并)提交,您使用git showgit log -p获得的补丁git log -p (Usually they have not thought about what they mean if the commit is a merge, and therefore has multiple parents. For these, git show generally shows a combined diff of the merge commit against all its parents, but that may not match the user's intent here; see the git-show documentation for details.) (通常他们没有考虑如果提交是一个合并它们意味着什么,因此有多个父母。对于这些, git show通常显示合并提交与其所有父项的组合差异,但这可能与用户的意图不匹配这里;有关详细信息,请参阅git-show文档 。)

When using git filter-branch , you will have to define this (changes with respect to what) yourself. 使用git filter-branch ,您必须自己定义(更改相关内容)。 The filter-branch command gives you the SHA-1 ID of the checked-out commit—even if it's only "virtually" checked out in step 1, rather than actually stuffed into an on-disk tree—in the environment variable $GIT_COMMIT . filter-branch命令为您提供签出提交的SHA-1 ID - 即使它仅在步骤1中“虚拟”检出,而不是实际填充到磁盘树中 - 在环境变量$GIT_COMMIT So, if your definition of "with respect to what" is "with respect to first parent", you can use gitrevisions syntax to refer to the parent: ${GIT_COMMIT}^ is the first-parent, even when ${GIT_COMMIT} is a raw SHA-1. 因此,如果您对“关于什么”的定义是“关于第一个父母”,您可以使用gitrevisions语法来引用父级: ${GIT_COMMIT}^是第一个父级,即使${GIT_COMMIT}是原始SHA-1。

A very crude and un-optimized --tree-filter that simply extracts the parent versions of each such file goes like this: 2 一个非常粗略和未优化的--tree-filter只是简单地提取每个这样的文件的父版本,如下所示: 2

for path in ...list-of-paths...; do
    git checkout -q ${GIT_COMMIT}^ -- $path 2>/dev/null
done
exit 0 # in case the last "git checkout" failed, override its status

which simply asks git to retrieve the parent commit's version of the file, discarding any error message that occurs because the file does not exist in the parent version. 它只是要求git检索父提交的文件版本,丢弃由于该文件在父版本中不存在而发生的任何错误消息。 But this may not match your intent either: it's not clear whether you want to remove the file if it is not in the parent. 但这可能与您的意图不符:如果文件不在父文件中,则不清楚是否要删除该文件。 Moreover, if a file is added or removed somewhere in the sequence of commits in your range, comparing each original commit only to its (single) original parent commit may mis-fire. 此外,如果在您的范围内的提交序列中的某处添加或删除文件,则仅将每个原始提交与其(单个)原始父提交进行比较可能会错误触发。 For instance, if file foo does not exist in commit C5, does exist in C6, and remains unchanged in C7, the comparison between C7 and C6 says "file unchanged" while the earlier comparison of C5-to-C6 says "file added". 例如,如果文件foo在提交C5中不存在,确实存在于C6中,并且在C7中保持不变,则C7和C6之间的比较表示“文件未更改”,而早期的C5到C6比较表示“文件已添加” 。 If your new (altered) C6—let's call it C6' to tell them apart—removes foo because it was not in C5, presumably your C7' should also omit file foo . 如果你的新的(改变的)C6-let叫它C6'告诉他们分开 - 删除foo因为它不在C5中,大概你的C7'也应该省略文件foo

Another alternative is to compare each commit to the (single) commit just before the entire range. 另一种方法是将每个提交与整个范围之前的(单个)提交进行比较。 If your range covers commits C1, C2, C3, ..., C9, we can call the single previous commit C0. 如果您的范围涵盖提交C1,C2,C3,...,C9,我们可以调用单个先前的提交C0。 Then, instead of comparing C1 to C1^, C2 to C2^, and so on, we can compare C1 to C0, C2 to C0, C3 to C0, and so on. 然后,不是将C1与C1 ^,C2与C2 ^进行比较,而是将C1与C0,C2与C0,C3与C0进行比较,依此类推。 Depending on your definition of "changes", this may be exactly what you want, because "undoing a change" may be transitive: we remove foo in our new C6, therefore we must remove foo in our new C7 as well; 根据您的“变化”的定义,这可能正是你想要的,因为“撤消变更”可能是传递的:除去foo在我们的新C6,因此,我们必须消除foo在我们新的C7为好; we add back bar in the new C7, therefore we must add it back in the new C8 as well, and so on. 我们在新的C7中添加了背bar ,因此我们必须将它添加回新的C8中,依此类推。

A less-crude version of the comparison script goes like this (this can be optimized for --index-filter as well, although I will leave the work up to someone else since this is meant for illustration): 比较脚本的粗略版本就像这样(这也可以针对--index-filter进行优化,虽然我会把工作留给其他人,因为这是为了说明):

# Note: I haven't tested this either, not sure how it behaves if
# used inside git filter-branch.  As a --tree-filter you would not
# really want to "git rm" anything, just to "rm" it.  As an
# --index-filter you would want to "git rm --cached".  For
# checkout, as a tree filter you want to extract the file into
# the working tree, and as an index filter you want to extract
# the file into the index.
git diff --name-status --no-renames $WITH_RESPECT_TO $GIT_COMMIT \
    -- ...paths... |
while read status path; do
    # note: $path may have embedded white space, so we
    # quote it below to protect it from breaking into words
    case $status in
    A) git rm -- "$path";; # file was added, rm it to undo
    D|M) git checkout $WITH_RESPECT_TO -- "$path";; # deleted or modified
    *) echo "file $path has strange status $status, help!" 1>&2; exit 1;;
    esac
done

Explanation: the above assumes you're filtering a (maybe linear, maybe branch-y) series of commits C1 , C2 , ..., Cn . 说明:上面假设您正在过滤一个(可能是线性的,可能是branch-y)系列的提交C1C2 ,..., Cn You want them to "not alter the contents or even existence" of some set of paths, with respect to some parent-of- C1 commit. 对于某些父级的C1提交,您希望它们“不改变某些路径的内容甚至存在”。 You must set an appropriate specifier into $WITH_RESPECT_TO . 您必须在$WITH_RESPECT_TO设置适当的说明$WITH_RESPECT_TO (This can come from the environment, or just be hard-coded into an actual script. Note that for your --index-filter or --tree-filter , you can have the shell run a script, rather than trying to do it all in line.) (这可能来自环境,或者只是硬编码到实际的脚本中。请注意,对于--index-filter--tree-filter ,您可以让shell运行脚本,而不是尝试执行它一切都好。)

For instance, if you're filtering X..Y , which means "all commits reachable from label Y excluding all commits reachable from label X ", it's possible that the appropriate value for $WITH_RESPECT_TO is simply X , but it is more likely the merge-base of X and Y . 例如,如果您正在过滤X..Y ,这意味着“所有可从标签Y到达的提交(不包括从标签X可到达的所有提交”), $WITH_RESPECT_TO的适当值可能只是X ,但更可能是XY合并基础。 If X and Y are branches that look something like this: 如果XY是看起来像这样的分支:

...-o-o-o-o-o-o   <-- master
     \
      *-o-o       <-- X
       \
        o-o-o-o   <-- Y

then you're filtering the commits on the bottom row, and the first commit to be filtered should probably be "unchanged with respect to some paths as seen in commit * " (the commit I marked with an asterisk). 然后你要过滤底行的提交,并且第一个要过滤的提交可能应该“相对于commit *某些路径不变”(我用星号标记的提交)。 That's the commit that git merge-base XY would come up with. 这就是git merge-base XY提出的提交。

If you're working with raw SHA-1 IDs, you might be able to use something like: 如果您正在使用原始SHA-1 ID,则可以使用以下内容:

WITH_RESPECT_TO=676699a0e0cdfd97521f3524c763222f1c30a094 \
git filter-branch ... (filter-branch arguments go here) ... --
676699a0e0cdfd97521f3524c763222f1c30a094..branch

where the raw SHA-1 is the ID of commit * , as it were. 其中原始SHA-1是commit *的ID,就像它一样。

As for the git diff itself, let's look at the sort of output it produces: 至于git diff本身,让我们看一下它产生的输出类型:

$ git diff --name-status --no-renames \
>  2cd861672e1021012f40597b9b68cc3a9af62e10 \
>  7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d
M       Documentation/RelNotes/1.8.5.4.txt
A       Documentation/RelNotes/1.8.5.5.txt
M       Documentation/git.txt
M       GIT-VERSION-GEN
M       RelNotes

(this is actual output of git diff on the source tree for git itself). (这是git本身的源代码树上git diff实际输出)。 Between those two revisions, one release-notes text file was modified, one was added, Documentation/git.txt was modified, and so on. 在这两个版本之间,修改了一个发布说明文本文件,添加了一个,修改了Documentation/git.txt ,依此类推。 Now let's try that again but restricting it to one real pathname and one fake one: 现在让我们再次尝试,但将其限制为一个真正的路径名和一个假路径名:

$ git diff --name-status --no-renames \
>  2cd861672e1021012f40597b9b68cc3a9af62e10 \
>  7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d \
>  -- Documentation/RelNotes/1.8.5.5.txt NoSuchFile
A       Documentation/RelNotes/1.8.5.5.txt

Now we find out about the one added file, but there is no complaint about the nonexistent file. 现在我们找到一个添加的文件,但没有关于不存在的文件的抱怨。 So it's OK to give "nonexistent" paths; 所以给“不存在”的路径是可以的; they simply won't occur in the output. 它们根本不会出现在输出中。

If diffing commit $WITH_RESPECT_TO against some later commit C says that path p is added in commit C , we know that it does not exist in $WITH_RESPECT_TO and does in C , so we want to remove it so that it's "unchanged". 如果针对某些后来的提交C $WITH_RESPECT_TO提交$WITH_RESPECT_TO表示路径p在提交C添加,我们知道它在$WITH_RESPECT_TO中不存在并且在C ,因此我们想要删除它以使其“未更改”。 (This is the case for status-letter A .) (这是状态字母A 。)

IF the diff says that path p is deleted in C , we know that it does exist in the first, and must be restored to remain "unchanged". 如果差异表示路径pC被删除,我们知道它确实存在于第一个中,并且必须恢复以保持“不变”。 (This is the case for status-letter D .) (这是状态字母D 。)

If the diff says that path p exists in both, but the contents of the file differ in C , the contents must be restored to remain "unchanged". 如果diff表示路径p存在,但文件内容在C不同,则必须恢复内容以保持“不变”。 (This is the case for status-letter M .) (这是状态字母M 。)

Other diff status letters are C , R , T , U , X , and B , but some cannot occur (we exclude C , R , and B by specifying appropriate git diff options; U only occurs during incomplete merges; and X should never occur: see What do the Git “pairing broken” and “unknown” statuses mean, and when do they occur? ). 其他差异状态字母是CRTUXB ,但有些不能发生(我们通过指定适当的git diff选项排除CRB ; U仅在不完全合并期间发生;并且X应该永远不会发生:看看Git“配对破坏”和“未知”状态意味着什么,它们何时发生? )。 The T case is possibly cause to abort the filtering (regular file changed to symlink, or vice versa, for instance; or something replaced with a submodule). T情况可能会导致中止过滤(例如,常规文件更改为符号链接,反之亦然;或者替换为子模块)。


If, after thinking about the issue for a while, you decide that "with respect to" should use parent commit(s), you can use git diff-tree , which—given a single commit—compares the tree of the commit with those of its parents. 如果在考虑了问题一段时间之后,你决定“关于” 应该使用父提交,你可以使用git diff-tree ,它给定一个提交 - 比较提交树和那些提交树它的父母。 (But again, note its behavior on merge commits, and make sure that's what you want.) (但请再次注意它在合并提交时的行为,并确保这是你想要的。)


1 When using --tree-filter , it actually does the full blown check-everything-out part. 1当使用--tree-filter ,它实际上会执行完整的检查 - 所有内容部分。 With --index-filter it writes the commit into the index, but not actually into the file system, and lets you make all the changes within the index. 使用--index-filter它将提交写入索引,但实际上不会写入文件系统,并允许您在索引中进行所有更改。 With --env-filter , --msg-filter , --parent-filter , and --commit-filter , it lets you change the text, author, and/or parents of each commit. 使用--env-filter , - --msg-filter--parent-filter --commit-filter--commit-filter ,它允许您更改每个提交的文本,作者和/或父级。 The --tag-name-filter lets you alter the tag names if needed, and causes the new names to point to the new commits instead of the old ones (hence --tag-name-filter cat leaves the names unchanged and makes those that pointed to the old commits, now point to the new ones). --tag-name-filter允许您根据需要更改标记名称,并使新名称指向新提交而不是旧提交(因此--tag-name-filter cat名称不变并使这些名称保持不变指向旧的提交,现在指向新的提交)。

The --prune-empty covers an edge case: if you have a chain of commits C1 <- C2 <- C3 , and your C2' (your copy of C2 ) has the same underlying tree as your C1' , comparing the trees of C2' and C1' produces an empty diff. --prune-empty覆盖了一个边缘情况:如果你有--prune-empty提交C1 <- C2 <- C3 ,你的C2' (你的C2副本)与你的C1'具有相同的底层树,比较树木C2'C1'产生空差异。 The filter-branch operation normally keeps these, but omits them if you use --prune-empty : your new chain will then be C1' <- C3' . filter-branch操作通常会保留这些,但如果你使用--prune-empty则省略它们:你的新链将是C1' <- C3' But note that the original chain may have "empty" commits; 但请注意,原始链可能有“空”提交; in this case, filter-branch will prune those even if the copies are actually the same as the originals. 在这种情况下,即使副本实际上与原始副本相同, filter-branch也会修剪它们。

2 These scripts are written as if in script files. 2这些脚本就像在脚本文件中一样编写。 If you turn them into one-liners you will need to add semicolons as necessary, and perhaps also turn exit into return , since you don't want the whole thing to exit when eval ed. 如果你将它们变成单行,你需要根据需要添加分号,也可以将exit转换为return ,因为你不希望在eval ed时退出整个东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM