简体   繁体   中英

filter-branch that includes all commits of renamed files

Using filter-branch like explained here it's possible to split out some subdirectory to a new repository. However the suggested filter will only keep commits where the files in the directory had the same name/path as they do now.

git filter-branch --prune-empty --subdirectory-filter FOLDER-NAME  BRANCH-NAME

I need a filter that will keep the same commits for each file as when doing gitk --follow FILE-NAME .

Essentially I need a filter that will keep commits for both the current filename/path and older filenames/paths for each file in the directory.

I tried:

git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- FOLDER-NAME' --prune-empty -- --all

but it did the same as --subdirectory-filter .

You will have to write your own, fancy (perhaps very fancy), filter.

By this I don't mean modify git filter-branch to add a new --EJS-filter . That would be one way to do it, but would require that you get ridiculously fancy. :-) Instead, what I mean is this:

  • Find the name(s) you want to preserve in particular commit range(s).

    This might be relatively easy: perhaps from commits "after" a123456 onward, you keep all files named newdirname/* , and from commits "before" a123456 , including a123456 itself, you keep all files named olddirname/* .

    It might even be extremely easy: perhaps in commits after a123456 , there are no files named olddirname/* , and in commits up to that point, there are no files named newdirname/* . In this case, your filter devolves to: Retain all files named olddirname/* or newdirname/* ; remove all other files.

    If it's not that easy, well, then it's not that easy.

  • Now that you have identified which files to keep, write a --tree-filter (slow, but very easy to write) or --index-filter (much faster, but harder to write) that retains the files you want and deletes the files you don't want.

Here, you can (and maybe must) make use of the way that git filter-branch operates. When you run git filter-branch , what Git does is:

  1. Enumerate every commit reachable from the branch(es) you're filtering, according to whatever gitrevisions style criteria you specify. With just --all at the end, enumerate every commit reachable from every branch.

  2. Put that list of commits into "reverse topological order", ie, start with the root commit (the very first commit ever made) and then list its immediate children, then their children, and so on. Hence for a commit chain that looks in part like this:

      G--H <-- branch1 / ...--E--F \\ I--J <-- branch2 

    the list would end with EF followed by either GHIJ or IJGH .

  3. For every commit in the list, do these steps (more or less):

    • check out the commit by its hash ID, saved temporarily in the variable $GIT_COMMIT ;
    • run each -- whatever -filter ;
    • make a new commit from whatever's left;
    • update a map: "old $GIT_COMMIT becomes new hash ID just made".

    In other words, git filter-branch simply copies every commit. With --prune-empty it skips the "make a new commit" step if the new commit would match its parent commit.

    In any case, the new commit's parent commit(s) is/are the mapped IDs. That is, if we start from original commit A , which is a root commit and has no parents, we make a new commit A' :

     A--B--C--... A' <-- (just made) 

    Then we copy B , since that's the only child of A . When we—through git filter-branch —make the new commit B' we make B' 's parent be A' rather than A :

     A--B--C--... A'-B' <-- (just made) 

    When we make C' from C , we'll have B' as its parent. Or, if—due to the --prune-empty rule—we don't actually make B' after all, we'll set C' 's parent to A' .

What this means is that you can use $GIT_COMMIT , if necessary, to decide which file name(s) you wish to keep. You can test it against all the existing commits in the repository, or build your own map of names to keep based on commit hashes, or whatever you like.

Note that in general, since you are dealing with a directed acyclic graph (DAG) of commit hashes, the test you usually want to use to implement these things is "is ancestor". If commit A is an ancestor of commit B —more precisely, if A ≼ B , so that you allow for A = B —then you have a "before" situation. This is what I described above: commits "before and including" a123456 use an older name, while subsequent commits use a newer one. The git merge-base --is-ancestor AB command performs this A ≼ B test, with its result delivered as a shell command exit status:

if git-merge-base --is-ancestor $GIT_COMMIT a123456; then ...; else ...; fi

tests whether $GIT_COMMITa123456 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM