[英]git filter-branch - discard the changes to a set of files in a range of commits
Say I have a branch dev
and I want to discard all the changes made to a set of files in the rage of commits in dev
branch since it diverged from master
. 假设我有一个分支
dev
,我想放弃在dev
分支的提交范围内对一组文件所做的所有更改,因为它与master
分开。 If a commit in this range only touches those files I'd liked it pruned. 如果此范围内的提交仅触及我喜欢的那些文件,则会将其修剪掉。 The closest I got was :
我得到的最接近的是:
git checkout dev
git filter-branch --force --tree-filter 'git checkout master -- \
a/b/c.png \
...
' --prune-empty -- master-dev-older-ancestor..HEAD
but this has these drawbacks 但这有这些缺点
error: pathspec 'a/b/c.png' did not match any file(s) known to git.
error: pathspec 'a/b/c.png' did not match any file(s) known to git.
I might decide to git checkout master-dev-older-ancestor
but then, git checkout master-dev-older-ancestor
但是接着, dev
at a later point dev
Fundamentally the point is that I do not want tell git to checkout a specific version of the file - I want to tell git to filter all commits in the range master-dev-older-ancestor..HEAD
to have all changes in an arbitrary set of files (present anywhere on master or not ) discarded . 从根本上说,我不想告诉git签出文件的特定版本 - 我想告诉git过滤范围
master-dev-older-ancestor..HEAD
中的所有提交,以便在任意集合中进行所有更改文件(呈现上的任何地方主与否 ) 丢弃 。
So how do I tell git ? 那我怎么告诉git?
Fundamentally, what filter-branch does is this—everything else is optimization and/or edge-cases: 1 从根本上说,filter-branch的作用是什么 - 其他一切都是优化和/或边缘情况: 1
Now let's consider your desired action, but I'm going to emphasize a different word: 现在让我们考虑你想要的行动,但我要强调一个不同的词:
filter all commits in [a] range ... to have all changes in an arbitrary set of files ... discarded
过滤[a]范围内的所有提交...以使任意文件集中的所有更改 ...被丢弃
I emphasize "changes" here because each commit is a complete, stand-alone entity. 我在此强调“更改”,因为每次提交都是一个完整的,独立的实体。 Commits don't have "changes", they just have files .
承诺没有 “改变”,他们只是有文件 。 The only way to see changes is to compare one specific commit against another specific commit:
git diff commitA commitB
for example. 查看更改的唯一方法是将一个特定提交与另一个特定提交进行比较:例如
git diff commitA commitB
。
Thus, when you say "changes to some file(s)", the immediate obvious question should be: changes with respect to what? 因此,当你说“改变某些文件”时,显而易见的问题应该是:关于什么的改变?
In most cases, people who talk about "changes in a commit" mean "changes in this commit with respect to its immediate ancestor": for simple (non-merge) commits, the patch you'd get with git show
or git log -p
. 在大多数情况下,谈论“提交中的更改”的人意味着“此提交相对于其直接祖先的更改”:对于简单(非合并)提交,您使用
git show
或git log -p
获得的补丁git log -p
。 (Usually they have not thought about what they mean if the commit is a merge, and therefore has multiple parents. For these, git show
generally shows a combined diff of the merge commit against all its parents, but that may not match the user's intent here; see the git-show documentation for details.) (通常他们没有考虑如果提交是一个合并它们意味着什么,因此有多个父母。对于这些,
git show
通常显示合并提交与其所有父项的组合差异,但这可能与用户的意图不匹配这里;有关详细信息,请参阅git-show文档 。)
When using git filter-branch
, you will have to define this (changes with respect to what) yourself. 使用
git filter-branch
,您必须自己定义(更改相关内容)。 The filter-branch
command gives you the SHA-1 ID of the checked-out commit—even if it's only "virtually" checked out in step 1, rather than actually stuffed into an on-disk tree—in the environment variable $GIT_COMMIT
. filter-branch
命令为您提供签出提交的SHA-1 ID - 即使它仅在步骤1中“虚拟”检出,而不是实际填充到磁盘树中 - 在环境变量$GIT_COMMIT
。 So, if your definition of "with respect to what" is "with respect to first parent", you can use gitrevisions
syntax to refer to the parent: ${GIT_COMMIT}^
is the first-parent, even when ${GIT_COMMIT}
is a raw SHA-1. 因此,如果您对“关于什么”的定义是“关于第一个父母”,您可以使用
gitrevisions
语法来引用父级: ${GIT_COMMIT}^
是第一个父级,即使${GIT_COMMIT}
是原始SHA-1。
A very crude and un-optimized --tree-filter
that simply extracts the parent versions of each such file goes like this: 2 一个非常粗略和未优化的
--tree-filter
只是简单地提取每个这样的文件的父版本,如下所示: 2
for path in ...list-of-paths...; do
git checkout -q ${GIT_COMMIT}^ -- $path 2>/dev/null
done
exit 0 # in case the last "git checkout" failed, override its status
which simply asks git to retrieve the parent commit's version of the file, discarding any error message that occurs because the file does not exist in the parent version. 它只是要求git检索父提交的文件版本,丢弃由于该文件在父版本中不存在而发生的任何错误消息。 But this may not match your intent either: it's not clear whether you want to remove the file if it is not in the parent.
但这可能与您的意图不符:如果文件不在父文件中,则不清楚是否要删除该文件。 Moreover, if a file is added or removed somewhere in the sequence of commits in your range, comparing each original commit only to its (single) original parent commit may mis-fire.
此外,如果在您的范围内的提交序列中的某处添加或删除文件,则仅将每个原始提交与其(单个)原始父提交进行比较可能会错误触发。 For instance, if file
foo
does not exist in commit C5, does exist in C6, and remains unchanged in C7, the comparison between C7 and C6 says "file unchanged" while the earlier comparison of C5-to-C6 says "file added". 例如,如果文件
foo
在提交C5中不存在,确实存在于C6中,并且在C7中保持不变,则C7和C6之间的比较表示“文件未更改”,而早期的C5到C6比较表示“文件已添加” 。 If your new (altered) C6—let's call it C6' to tell them apart—removes foo
because it was not in C5, presumably your C7' should also omit file foo
. 如果你的新的(改变的)C6-let叫它C6'告诉他们分开 - 删除
foo
因为它不在C5中,大概你的C7'也应该省略文件foo
。
Another alternative is to compare each commit to the (single) commit just before the entire range. 另一种方法是将每个提交与整个范围之前的(单个)提交进行比较。 If your range covers commits C1, C2, C3, ..., C9, we can call the single previous commit C0.
如果您的范围涵盖提交C1,C2,C3,...,C9,我们可以调用单个先前的提交C0。 Then, instead of comparing C1 to C1^, C2 to C2^, and so on, we can compare C1 to C0, C2 to C0, C3 to C0, and so on.
然后,不是将C1与C1 ^,C2与C2 ^进行比较,而是将C1与C0,C2与C0,C3与C0进行比较,依此类推。 Depending on your definition of "changes", this may be exactly what you want, because "undoing a change" may be transitive: we remove
foo
in our new C6, therefore we must remove foo
in our new C7 as well; 根据您的“变化”的定义,这可能正是你想要的,因为“撤消变更”可能是传递的:除去
foo
在我们的新C6,因此,我们必须消除foo
在我们新的C7为好; we add back bar
in the new C7, therefore we must add it back in the new C8 as well, and so on. 我们在新的C7中添加了背
bar
,因此我们必须将它添加回新的C8中,依此类推。
A less-crude version of the comparison script goes like this (this can be optimized for --index-filter
as well, although I will leave the work up to someone else since this is meant for illustration): 比较脚本的粗略版本就像这样(这也可以针对
--index-filter
进行优化,虽然我会把工作留给其他人,因为这是为了说明):
# Note: I haven't tested this either, not sure how it behaves if
# used inside git filter-branch. As a --tree-filter you would not
# really want to "git rm" anything, just to "rm" it. As an
# --index-filter you would want to "git rm --cached". For
# checkout, as a tree filter you want to extract the file into
# the working tree, and as an index filter you want to extract
# the file into the index.
git diff --name-status --no-renames $WITH_RESPECT_TO $GIT_COMMIT \
-- ...paths... |
while read status path; do
# note: $path may have embedded white space, so we
# quote it below to protect it from breaking into words
case $status in
A) git rm -- "$path";; # file was added, rm it to undo
D|M) git checkout $WITH_RESPECT_TO -- "$path";; # deleted or modified
*) echo "file $path has strange status $status, help!" 1>&2; exit 1;;
esac
done
Explanation: the above assumes you're filtering a (maybe linear, maybe branch-y) series of commits C1
, C2
, ..., Cn
. 说明:上面假设您正在过滤一个(可能是线性的,可能是branch-y)系列的提交
C1
, C2
,..., Cn
。 You want them to "not alter the contents or even existence" of some set of paths, with respect to some parent-of- C1
commit. 对于某些父级的
C1
提交,您希望它们“不改变某些路径的内容甚至存在”。 You must set an appropriate specifier into $WITH_RESPECT_TO
. 您必须在
$WITH_RESPECT_TO
设置适当的说明$WITH_RESPECT_TO
。 (This can come from the environment, or just be hard-coded into an actual script. Note that for your --index-filter
or --tree-filter
, you can have the shell run a script, rather than trying to do it all in line.) (这可能来自环境,或者只是硬编码到实际的脚本中。请注意,对于
--index-filter
或--tree-filter
,您可以让shell运行脚本,而不是尝试执行它一切都好。)
For instance, if you're filtering X..Y
, which means "all commits reachable from label Y
excluding all commits reachable from label X
", it's possible that the appropriate value for $WITH_RESPECT_TO
is simply X
, but it is more likely the merge-base of X
and Y
. 例如,如果您正在过滤
X..Y
,这意味着“所有可从标签Y
到达的提交(不包括从标签X
可到达的所有提交”), $WITH_RESPECT_TO
的适当值可能只是X
,但更可能是X
和Y
合并基础。 If X
and Y
are branches that look something like this: 如果
X
和Y
是看起来像这样的分支:
...-o-o-o-o-o-o <-- master
\
*-o-o <-- X
\
o-o-o-o <-- Y
then you're filtering the commits on the bottom row, and the first commit to be filtered should probably be "unchanged with respect to some paths as seen in commit *
" (the commit I marked with an asterisk). 然后你要过滤底行的提交,并且第一个要过滤的提交可能应该“相对于commit
*
某些路径不变”(我用星号标记的提交)。 That's the commit that git merge-base XY
would come up with. 这就是
git merge-base XY
提出的提交。
If you're working with raw SHA-1 IDs, you might be able to use something like: 如果您正在使用原始SHA-1 ID,则可以使用以下内容:
WITH_RESPECT_TO=676699a0e0cdfd97521f3524c763222f1c30a094 \
git filter-branch ... (filter-branch arguments go here) ... --
676699a0e0cdfd97521f3524c763222f1c30a094..branch
where the raw SHA-1 is the ID of commit *
, as it were. 其中原始SHA-1是commit
*
的ID,就像它一样。
As for the git diff
itself, let's look at the sort of output it produces: 至于
git diff
本身,让我们看一下它产生的输出类型:
$ git diff --name-status --no-renames \
> 2cd861672e1021012f40597b9b68cc3a9af62e10 \
> 7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d
M Documentation/RelNotes/1.8.5.4.txt
A Documentation/RelNotes/1.8.5.5.txt
M Documentation/git.txt
M GIT-VERSION-GEN
M RelNotes
(this is actual output of git diff
on the source tree for git
itself). (这是
git
本身的源代码树上git diff
实际输出)。 Between those two revisions, one release-notes text file was modified, one was added, Documentation/git.txt
was modified, and so on. 在这两个版本之间,修改了一个发布说明文本文件,添加了一个,修改了
Documentation/git.txt
,依此类推。 Now let's try that again but restricting it to one real pathname and one fake one: 现在让我们再次尝试,但将其限制为一个真正的路径名和一个假路径名:
$ git diff --name-status --no-renames \
> 2cd861672e1021012f40597b9b68cc3a9af62e10 \
> 7bbc4e8fdb33e0a8e42e77cc05460d4c4f615f4d \
> -- Documentation/RelNotes/1.8.5.5.txt NoSuchFile
A Documentation/RelNotes/1.8.5.5.txt
Now we find out about the one added file, but there is no complaint about the nonexistent file. 现在我们找到一个添加的文件,但没有关于不存在的文件的抱怨。 So it's OK to give "nonexistent" paths;
所以给“不存在”的路径是可以的; they simply won't occur in the output.
它们根本不会出现在输出中。
If diffing commit $WITH_RESPECT_TO
against some later commit C
says that path p
is added in commit C
, we know that it does not exist in $WITH_RESPECT_TO
and does in C
, so we want to remove it so that it's "unchanged". 如果针对某些后来的提交
C
$WITH_RESPECT_TO
提交$WITH_RESPECT_TO
表示路径p
在提交C
添加,我们知道它在$WITH_RESPECT_TO
中不存在并且在C
,因此我们想要删除它以使其“未更改”。 (This is the case for status-letter A
.) (这是状态字母
A
。)
IF the diff says that path p
is deleted in C
, we know that it does exist in the first, and must be restored to remain "unchanged". 如果差异表示路径
p
在C
被删除,我们知道它确实存在于第一个中,并且必须恢复以保持“不变”。 (This is the case for status-letter D
.) (这是状态字母
D
。)
If the diff says that path p
exists in both, but the contents of the file differ in C
, the contents must be restored to remain "unchanged". 如果diff表示路径
p
存在,但文件内容在C
不同,则必须恢复内容以保持“不变”。 (This is the case for status-letter M
.) (这是状态字母
M
。)
Other diff status letters are C
, R
, T
, U
, X
, and B
, but some cannot occur (we exclude C
, R
, and B
by specifying appropriate git diff
options; U
only occurs during incomplete merges; and X
should never occur: see What do the Git “pairing broken” and “unknown” statuses mean, and when do they occur? ). 其他差异状态字母是
C
, R
, T
, U
, X
和B
,但有些不能发生(我们通过指定适当的git diff
选项排除C
, R
和B
; U
仅在不完全合并期间发生;并且X
应该永远不会发生:看看Git“配对破坏”和“未知”状态意味着什么,它们何时发生? )。 The T
case is possibly cause to abort the filtering (regular file changed to symlink, or vice versa, for instance; or something replaced with a submodule). T
情况可能会导致中止过滤(例如,常规文件更改为符号链接,反之亦然;或者替换为子模块)。
If, after thinking about the issue for a while, you decide that "with respect to" should use parent commit(s), you can use git diff-tree
, which—given a single commit—compares the tree of the commit with those of its parents. 如果在考虑了问题一段时间之后,你决定“关于” 应该使用父提交,你可以使用
git diff-tree
,它给定一个提交 - 比较提交树和那些提交树它的父母。 (But again, note its behavior on merge commits, and make sure that's what you want.) (但请再次注意它在合并提交时的行为,并确保这是你想要的。)
1 When using --tree-filter
, it actually does the full blown check-everything-out part. 1当使用
--tree-filter
,它实际上会执行完整的检查 - 所有内容部分。 With --index-filter
it writes the commit into the index, but not actually into the file system, and lets you make all the changes within the index. 使用
--index-filter
它将提交写入索引,但实际上不会写入文件系统,并允许您在索引中进行所有更改。 With --env-filter
, --msg-filter
, --parent-filter
, and --commit-filter
, it lets you change the text, author, and/or parents of each commit. 使用
--env-filter
, - --msg-filter
, --parent-filter
--commit-filter
和--commit-filter
,它允许您更改每个提交的文本,作者和/或父级。 The --tag-name-filter
lets you alter the tag names if needed, and causes the new names to point to the new commits instead of the old ones (hence --tag-name-filter cat
leaves the names unchanged and makes those that pointed to the old commits, now point to the new ones). --tag-name-filter
允许您根据需要更改标记名称,并使新名称指向新提交而不是旧提交(因此--tag-name-filter cat
名称不变并使这些名称保持不变指向旧的提交,现在指向新的提交)。
The --prune-empty
covers an edge case: if you have a chain of commits C1 <- C2 <- C3
, and your C2'
(your copy of C2
) has the same underlying tree as your C1'
, comparing the trees of C2'
and C1'
produces an empty diff. --prune-empty
覆盖了一个边缘情况:如果你有--prune-empty
提交C1 <- C2 <- C3
,你的C2'
(你的C2
副本)与你的C1'
具有相同的底层树,比较树木C2'
和C1'
产生空差异。 The filter-branch operation normally keeps these, but omits them if you use --prune-empty
: your new chain will then be C1' <- C3'
. filter-branch操作通常会保留这些,但如果你使用
--prune-empty
则省略它们:你的新链将是C1' <- C3'
。 But note that the original chain may have "empty" commits; 但请注意,原始链可能有“空”提交; in this case,
filter-branch
will prune those even if the copies are actually the same as the originals. 在这种情况下,即使副本实际上与原始副本相同,
filter-branch
也会修剪它们。
2 These scripts are written as if in script files. 2这些脚本就像在脚本文件中一样编写。 If you turn them into one-liners you will need to add semicolons as necessary, and perhaps also turn
exit
into return
, since you don't want the whole thing to exit when eval
ed. 如果你将它们变成单行,你需要根据需要添加分号,也可以将
exit
转换为return
,因为你不希望在eval
ed时退出整个东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.