简体   繁体   中英

Rebase a branch with master with commits removed from master

Let's say you master looked like

1 2 3 4 5

Where 1 ~ 5 are separate revisions. Now branchX looks like

1 2 3 4 5 6 7

Then, due to some reason some commits were removed from master, So now

1 2 4 5

is how the master looks ( 3 was removed).

I want to rebase branchX with master

It should look like

1 2 4 5 6 7

Edit: Here in this simple example just 6 7 , just two commits were added, but in my real scenario, I have 200 commits added to branchX

This is difficult, or sometimes even impossible, in general. It's easier—in fact, sometimes much easier—if you add some constraints.

TL;DR: conclusions

If you have a suitable reflog entry, such as master@{1} , the command sequence is just:

$ git checkout branchX
$ git rebase --onto master master@{1}

If not, we must find an appropriate upstream limit commit:

$ limit=$(git rev-list --topo-order --cherry master...branchX | 
  sed -n -e 's/=//p' | head -1)
$ echo $limit    # if this is empty, there's no equivalent commit and you are SOL
$ git checkout branchX       # same as before
$ git rebase --onto master $limit

How we got here

First, remember that a branch name, in Git, names every commit reachable from the branch tip (the tip being the commit to which the branch name itself points). Reachability here is determined by the arcs in the DAG, ie, which commits are considered ancestors of which later commits.

Remember also that the true name of each commit is its SHA-1 ID, and these are all unique, and determined by the read-only contents of the commit object. It's impossible to remove a commit: you can only copy all its children, to new (different) commits, with the original children pointing back to the commit's parent(s) and their copied descendants pointing to the corresponding copied parent(s).

Your scenario says that you actually had this:

A--B--C--D--E   <-- master
             \
              F--G   <-- branchX

where each commit's parent is found by following the direct links in a generally-leftward direction. The (single) parent of G is F ; F 's parent is E ; E 's parent is D , and so on, back to A , which has no parents at all (is a root commit).

The set of commits reachable from master is ABCDE . The set of commits reachable from branchX is ABCDEFG . The way you and Git can talk about "commits on branchX" without getting ABCDE is to use, not just branchX , but rather master..branchX . This is the set of commits reachable from branchX minus the set reachable from master .

Then, to "delete" commit C from master , this must have happened:

     D'-E'  <-- master
    /
A--B--C--D--E   [master was this before the copies]
             \
              F--G   <-- branchX

Here D' and E' are actually copies of D and E . The originals remain in the repository, and are still reachable from branchX . The expression master..branchX no longer works, though, because master now names E' and ancestors, ie, AB-D'-E' . This subtracts those commits—it's allowed to subtract away something that was not there in the first place, in set algebra—giving CDEFG , which is not what you want.

How to get what you want

The basic problem comes down to identifying commit E . If we can find commit E , we can write E..branchX , ie, the set of all commits reachable from branchX , minus the set reachable from commit E . But how shall we find E ?

If you are the one who re-pointed the name master to commit E' , this could be very easy. All you have to do is save the SHA-1 hash of E somewhere first—and in fact, if you're the one who rewrote master this way, you did save it, in the reflog you have for your master . The reflog entries are master@{1} , master@{2} , and so on. You can view these with git reflog master . 1 Each reflog entry also has a date-and-time stamp, so you can write master@{yesterday} or master@{1.week.ago} to look up the appropriate numbered entry based on a relative date.

That's the easiest way by far, and it works in all cases, even if E is the commit that was "removed". Note that when we "remove" commit C , we must copy D and E to D' and E' . That is because those two commits were descendants of C that were reachable from master . Should we decide to remove E , though ... well, what are the children of E that are reachable from master ?

That's right: there are no such commits. We can simply point master back at commit D , leaving ABCD on master , and E apparently unique to branchX now. Any time we adjust our master like this, though, we make a reflog entry to keep the previous value, so once again, we can simply look in the reflog to discover that E is the interesting commit.

The problem here comes in if (a) we didn't adjust master ourselves or (b) we did do it, but so long ago that our reflog entries have expired . (This occurs after 30 days by default for cases like commit E .) In this case, we can only find E if there is some copy E' in the new chain. Even then, we can still only find it if the copy E' has the same patch ID as E .

Patch IDs are how git cherry , and hence git rev-list 's --cherry-pick and --cherry-mark options, work. We make (or Git makes) the assumption that when a commit is copied, usually it's copied with no significant changes, such that a hash ID computed by examining a slightly stripped-down git show of the commit will come up with the same hash ID for the original and for the copy. These patches are called patch equivalent and mark the paired-up commits as, in some sense, "equal".

We also must 2 make use of the symmetric difference notation, master...branchX or branchX...master . Because it's symmetric, it doesn't really matter which order we use (except for the whole left vs right part in --left-right in git rev-list , which we will generally want). What it does, in any case, is to produce the following set algebra operation:

A..B = (reachable(A) | reachable(B)) - (reachable(A) & reachable(B))

That is, produce the set of commits reachable from either branch tip, excluding those commits that are reachable from both branch tips. Hence, given:

     D'-E'  <-- master
    /
A--B--C--D--E--F--G   <-- branchX

the symmetric difference gives us D', E', C, D, E, F, G .

Hence, if we run git rev-list master...branchX , we will get this complete set of commits. All we have to do now is see that D' = D and E' = E , and somehow choose E from this set. So now we add --cherry-mark to the git rev-list command: this marks D' and E' and D and E with = characters, and marks C , F , and G with + characters. Here I have run it on a repo that isn't quite as detailed: in effect I just have E and E' plus one unique commit.

$ git rev-list --cherry-mark master...two
=dcbcb2774954437ef0906c6770c7deb924d9286e
+0af7c6a3cf5e49928de132c341c848be80ab84c7
=643b37ef242fdc35dfdd4551b42393af3eb91a85

OK so far, but there's an obvious problem: this lists both E and E' , and we only wanted E . Well, let's back up a moment and do this other rev-list variant:

$ git rev-list --left-right master...two
>dcbcb2774954437ef0906c6770c7deb924d9286e
<0af7c6a3cf5e49928de132c341c848be80ab84c7
<643b37ef242fdc35dfdd4551b42393af3eb91a85

This marks each commit, not with + or = , but rather with < (left) or > (right). The commit that's on branch two , that is "the same as" the one on master , is in fact dcbcb27... . The commit that's on master that is the same as the one on two is 643b37e... . This left/right distinction gives us a way to identify which commit is E and which one is E' : the one we care about, for the sake of discarding, is the one on branchX , so whichever side of the symmetric difference we put branchX on, that's the side to take.

Now we can make use of one more rev-list option: --left-only or --right-only . These may be used in combination with --cherry-mark , hence:

$ git rev-list --left-only --cherry-mark master...two
+0af7c6a3cf5e49928de132c341c848be80ab84c7
=643b37ef242fdc35dfdd4551b42393af3eb91a85

or:

$ git rev-list --right-only --cherry-mark master...two
=dcbcb2774954437ef0906c6770c7deb924d9286e

Thus, we can run this command and pick out just the = -marked commit(s) to find D and E .

In fact, there's a shorthand for --right-only --cherry-mark (though it also adds --no-merges ), spelled --cherry . We can put the branch we want ( branchX ) on the right and use this:

$ git rev-list --cherry master...branchX

Again, this spits out both + and = commits. We want to find the = ones, so we run this through sed , telling it to remove = and print lines, or not print lines if there is no = to remove:

$ git rev-list --cherry master...branchX | sed -n -e 's/=//p'

and this will list the IDs of commits D and E .

We only really want E (and we can use head -1 to get it, provided we make sure we get the commits in topological-sort order), but in fact, it doesn't entirely hurt to exclude D as well. But if we're going to use git rebase to copy the branchX commits, we really do want to find just E , so our final command is:

$ limit=$(git rev-list --topo-order --cherry master...branchX | 
  sed -n -e 's/=//p' | head -1)

Now we can run our final git rebase command:

$ git checkout branchX    # if needed
$ git rebase --onto master $limit

This rebases, ie, copies, commits that are on the current branch, ie, branchX , excluding the limiting commit and anything earlier—hence excluding E and earlier—with the copies going after ( --onto ) master .

Note, though, that it's possible that there are no patch-equivalent commits in the symmetric difference. In this case, if you are quite certain there was a removed commit, you will have to find the limiting ( E in our example) commit yourself, through some other non-automated method. Once you find the "commit E ", the rest goes just as before, using the hash ID as the limit for the --onto master rebase.


1 Note that git reflog branch actually just runs git log -g --oneline branch . This means you can run the same git log command but omit --oneline , or replace it with a --pretty=format:... or --format=... directive to make up your own format, vs the standard --oneline format.

2 OK, "should". :-) It is technically possible to do this manually, running git patch-id on each commit yourself. But given that git rev-list does it for you, automatically, why bother? 3

3 Stubbornness and/or obstinacy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM