简体   繁体   中英

Using git filter-branch for specific commits

I'm trying to use the git filter-branch feature to remove a file that was recently updated and committed. I tried running the following command:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch myfile' --prune-empty --tag-name-filter cat -- 6f7fda9..HEAD

However this only removes the file from the master branch, and I want it removed from all branches.

Starting with commit 6f7fda9 to HEAD I want the file removed. Is the command I'm running wrong?

git filter-branch -- --all runs the filter on all branches. So:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch myfile' --prune-empty --tag-name-filter cat -- --all

I want [the file] removed from all branches

It's important to realize that branches are almost (but not quite) irrelevant. What matters are the commits .

You literally cannot change any existing commit, and Git does not try. What git filter-branch does is that it copies commits. That is, for each commit to be filtered, Git extracts the original into a temporary work area, applies your filter(s), and then makes a new commit from the result.

If the new commit is bit-for-bit identical to the original commit, it re-uses the actual underlying object in the repository database. If not—and the purpose is to result in "not"—the original commit remains, while the new copy gets a new, different hash ID. If we use uppercase letters to stand in for commit hash IDs, and remember that each commit stores the hash ID of its parent commit, we can draw the originals this way:

... <-F <-G <-H <-I   <-- master

A branch name like master remembers the hash ID of the last commit. That commit remembers the hash ID of its parent, which remembers another hash ID of another parent, and so on: master lets Git find commit I , which finds commit H , which finds commit G , and so on.

With git filter-branch we tell Git: extract commit F and maybe make some change to it and then re-commit. If nothing changes in F , we stick with the actual hash ID. Then we have Git extract commit G and make some change. This time, perhaps we remove a sensitive file. So we make a new commit that's like G but different: it gets a new, different hash ID, which we can call G' . Commit G' still has commit F as its parent:

...--F--G--H--I   <-- master
      \
       G'

We then extract H and apply the filter. Even if nothing else changes, we need our new commit to point back to G' , so filter-branch ensures that this happens, and therefore we get a commit H' that points back to G' . We repeat for I and the result is:

...--F--G--H--I   <-- master
      \
       G'-H'-I'

The final step is for git filter-branch to rewrite each of the branch names . The name master must now point to commit I' , with its new and different hash, not to shabby old icky I .

The names that git filter-branch rewrites at the end of its processing are all the names you identified positively on the command line. This part is a little tricky: git filter-branch takes, as one / some of its arguments, strings that are suitable for git rev-list . These can be positive references like master , or negative references like ^develop or ^6f7fda9 .

A negative reference tells Git: don't bother with these commits . If you use ^6f7fda9 to skip commit 6f7fda9 and anything "before" (graph-wise) that commit, git filter-branch will not have to spend any computer-time working on that commit.

The expression 6f7fda9..HEAD is shorthand for ^6f7fda9 HEAD , and HEAD means the current branch name . So this is a positive reference to one branch name (such as master ), and one negative reference by hash ID.

You can name all your branch names with --branches . You can name all your references (including things that are not branch names) with --all . Filter-branch will only rewrite the positive references, but it will rewrite all of them. Be a bit careful with this as this can rewrite refs/stash for instance.

When you do rewrite any branch, tag, or other name that refers to some commit that does contain the file you don't want to have, you'll get things like:

                    tip2   [abandoned]
                   /
...--good--bad--...--tip   [abandoned]
       \
        copied--...--tip'   <-- branch1
                   \
                    tip2'   <-- branch2

If you don't rewrite some name that points anywhere to any of the commits from bad on down (rightward), those names will still point to the "bad" commits that have the file you want to be rid of. (Remember that in these particular graph drawings that I do on StackOverflow, earlier / parent commits are to the left, later / child commits are to the right.)

Your requirements as stated are contradictory. Specifically

I want it removed from all branches.

and

Starting with commit 6f7fda9 to HEAD I want the file removed.

need to be reconciled. I suspect this comes down to an inaccurate understanding of commit ranges - which are only sort-of a thing in git.

Consider this commit graph:

x -- 6f7fda9 -- A -- B -- C -- F <--(master)
                 \                        ^(HEAD)
                  D -- E <--(branch)

So HEAD is at master which is at F ; and there's a branch which was (apparently) created from A (after 6f7fda9 but before HEAD ).

Now the question is, given this graph what does 6f7fda9..HEAD mean? And unfortunately, the answer isn't what a lot of people intuitively think.

6f7fda9..HEAD is short for HEAD ^6f7fda9 - meaning "everything reachable from HEAD but not reachable from 6f7fda9 ". "Reachable" means "the commit itself, and any commits you find by following parent pointers". So in this case, it means A , B , C , and F ; but not x or 6f7fda9 (because they're reachable from 6f7fda9 ) and also not D , or E (because they aren't reachable from HEAD ).

There are several ways to get filter-branch to process all the branches. For example you could

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch myfile' --prune-empty --tag-name-filter cat -- --all

But this will include all refs (not just all branches); if that's a problem

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch myfile' --prune-empty --tag-name-filter cat -- --branches

One other caveat - if you specifically don't want commits before 6f7fda9 rewritten, then you need to include one or more negative commit references. But assuming you do intend to include 6f7fda9 itself, you'd exclude its parent (not itself).

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch myfile' --prune-empty --tag-name-filter cat -- ^6f7fda9^ --branches

If 6f7fda9 is a merge, you'd have to list negative commit references for each of its parents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM