简体   繁体   中英

Why does git pull origin develop --rebase cause conflict when git pull origin develop doesn't?

Now normally I use

git pull origin develop

into get the latest updates from the develop branch. Recently, my team has been transitioning into using rebase instead of merging so I'm a bit confused on some stuff. Before my workflow is pretty straight forward. I would first checkout into the develop branch and use

git checkout -b feature/foo

I would then make my changes, commit and then push them. Usually the develop branch would have some changes made thus, I would use

 git pull origin develop

to get the latest changes and have conflicts only if other people modified the same file. However, when I use

git pull origin develop --rebase

I notice that I would have conflicts with my own branch even though I'm the only person who has modified it. Is there a particular reason for this? Is there a way to avoid these merge conflict that I have with my own branch?

First, let's note that git pull mainly consists of running two Git commands. This means it's meant to be a convenience operation, to let you type git pull instead of git fetch enter git ..... . The first command is always git fetch , and the second is your choice: it defaults to git merge , but you can choose git rebase . It takes almost as much typing to do the one command as the two, when you want to rebase, so it's not really very convenient after all, and I suggest using the separate git fetch and second command, at least until you're very familiar with Git. 1

So your question really resolves to a simpler one: Why does rebase sometimes have conflicts that merge doesn't have? And there's an answer to that, which is actually fairly simple: Rebase is mainly just repeated cherry-picking, and cherry-picking is a form of merging . So when you merge, you have one place where you can get conflicts. If you rebase ten commits, you have ten places where you can get conflicts. The conflicts themselves can be different as well, but the sheer scale of opportunity is the major factor here.


1 In repositories with submodules, git pull can recurse into the submodules, in which case it's more than two commands and its convenience aspect becomes significant. You can also configure git pull to run git rebase by default, making the convenience re-appear even without submodules. I still encourage new users to use two separate commands, though—the syntax for git pull is a little weird and a little different from almost all other Git stuff, and it gets too easily confusing. There is too much magic assigned to pull, when actually all the magic is from the second command—and you need to learn merge to understand rebase.


Merging

Although the implementation is full of tricky little twists and turns, the idea behind merging is simple. When we ask Git to merge, we have "our work" and "their work". Git needs to figure out what we changed, what they changed, and combine those changes.

In order to do that, Git needs to find a common starting point. A commit isn't a set of changes at all: it's actually a snapshot. Git can show one of these snapshots as differences from its immediate predecessor, ie, extract both snapshots and see what's different. So if we started from some commit with some hash ID B , and they also started from that same commit:

          C--D   <-- our-branch (HEAD)
         /
...--A--B
         \
          E--F   <-- their-branch

then Git can compare the snapshot in B to our latest, D , and to their latest, F . Whatever's different in B -vs- D is stuff we changed. Whatever's different in B -vs- F is stuff they changed. Git then combines the changes, applies the combined changes to the snapshot from the merge base B , and commits the result, hooking it up with not one but two predecessors:

          C--D
         /    \
...--A--B      G   <-- our-branch (HEAD)
         \    /
          E--F   <-- their-branch

To get there, Git has to run:

  • git diff --find-renames hash-of-B hash-of-D (what we changed)
  • git diff --find-renames hash-of-B hash-of-F (what they changed)

When Git goes to combine these two diffs, there can be places where we and they changed the same lines of the same file . If we didn't make the same change to those lines, Git will declare a conflict and stop the merge in the middle, not make commit G yet , and force us to clean up the mess and finish the merge to create G .

Cherry-picking

The idea behind cherry-pick is to copy a commit. To copy a commit, we can have Git turn it into a set of changes:

  • git diff --find-renames hash-of-parent hash-of-commit

We can then take these changes and hand-apply them somewhere else, ie, to some other commit. For instance, if we have:

          C--D   <-- our-branch (HEAD)
         /
...--A--B
         \
          E--F   <-- their-branch

and we like what they did in F , but don't want E itself yet, we can diff E vs F , to see what they did. We can use that to try to make the same change to our snapshot in D . Then we make ourselves a new commit—let's call it F' to mean copy of F :

          C--D--F'  <-- our-branch (HEAD)
         /
...--A--B
         \
          E--F   <-- their-branch

But if we made significant changes in C , or they made significant changes in E , it may be hard to get the changes they made from E -to- F to line up with what's in our snapshot in D . For Git to help us out, and do this copying automatically , Git would like to know: what's different between E and D ? That is, Git wants to run:

  • git diff --find-renames hash-of-E hash-of-D (what we have in C , vs E )
  • git diff --find-renames hash-of-E hash-of-F (what they changed in F )

But wait, we just saw this same pattern above, during git merge ! And in fact, that's precisely what Git does here: it uses the same code as git merge , it just forces the merge base—which would be B for a regular merge—to be commit E , the parent of commit F that we're cherry-picking. Git now combines our changes with their changes, applying the combined set of changes to the snapshot in the base—in E —and making the final F' commit on its own, but this time as a regular commit.

The new commit re-uses the commit message from commit F itself too, so that the new commit F' (which has some new hash ID, different from F 's) resembles F a lot: git show probably shows the same, or a very similar, diff listing for each, and of course the same commit log message.

As with git merge , this merging process—what I like to call merge as a verb —can go wrong. If does go wrong, Git complains about the merge conflict, stops with the merge unfinished, and makes you clean up the mess and commit. When you do commit, Git knows you're finishing up a git cherry-pick and copies the commit message for you at that point, to make F' .

Rebase is repeated cherry-picking

To do a git rebase target , Git:

  • lists the commits you have on your branch that are not reachable (a technical term: see Think Like (a) Git from target ;
  • trims this list if appropriate—see below;
  • checks out commit target as a "detached HEAD";
  • repeatedly, one commit at a time, uses git cherry-pick to copy each commit that's in the list. 2

Once all the to-be-copied commits have been copied successfully, Git moves the branch name to the end of the copied list.

Suppose we start with a similar setup to before, though I'll list a few more commits here:

          C--D--E--F   <-- our-branch (HEAD)
         /
...--A--B
         \
          G--H   <-- their-branch

We run git rebase their-branch , so Git lists out the commits to copy: CDEF , in that order. Then Git checks out commit H as a "detached HEAD":

          C--D--E--F   <-- our-branch
         /
...--A--B
         \
          G--H   <-- their-branch, HEAD

Now Git will cherry-pick C to copy it. If that goes well:

          C--D--E--F   <-- our-branch
         /
...--A--B
         \
          G--H   <-- their-branch
              \
               C'  <-- HEAD

Git repeats for D , E , and F . Once it's done D and E we're in this state:

          C--D--E--F   <-- our-branch
         /
...--A--B
         \
          G--H   <-- their-branch
              \
               C'-D'-E'  <-- HEAD

After Git finishes copying F to F' , the last step of rebase is to yank the name our-branch over to point to the final copied commit, and re-attach HEAD to it:

          C--D--E--F   [abandoned]
         /
...--A--B
         \
          G--H   <-- their-branch
              \
               C'-D'-E'-F'  <-- our-branch (HEAD)

Each cherry-pick does one three-way merge, with the merge base of the operation being the parent of the commit being copied and the "ours" commit being the one on the detached HEAD —note that initially that's their commit H , and as we progress, it becomes "their commit H plus our work" over time. The "theirs" commit is, each time, our own commit. Each cherry-pick can have all the usual merge conflicts, though in most cases, most don't have any.

There are two cases in particular that are especially bad. One of these, probably the most common, is when any of your own commits, in the list CDEF for instance, are themselves cherry-picks of something that was in the GH chain (which is often rather longer than just two commits)—or vice versa, eg, perhaps H is essentially D' .

If you, or they, were able to make that cherry-pick earlier easily, without conflicts, your copy probably looks almost exactly like, or even 100% exactly like, one of the GH chain. If that's the case, Git can recognize that it is such a copy, and remove it from the "to be copied" list. In our example here, if H is really D' , and Git can see that, Git will remove D from the to-be-copied list, and only copy CEF . But if not—if, for instance, they had to change their copy of D a bunch to make H —then Git will try to copy D and these changes almost certainly will conflict with their modified H .

If you merge rather than copying, you will compare B vs H (theirs) and B vs F (yours) and the chances of conflicts are perhaps reduced. Even if there are conflicts, they're probably more obvious and easier to resolve. If the conflicts are because of an unnecessary copy, they tend, in my experience, to look trickier.

The other common problem case is when, in your CDEF chain, your last few commits were something you did specifically in order to make merging easier. That is, someone may have said something like: we changed the foo subsystem, now you need a third parameter and you added the third parameter in F after cherry-picking the change in E . You'll get conflicts when copying C and D . You might skip copying E because it is a cherry-pick, and then copying F is unnecessary after you've fixed the conflicts in D and E , but that's two copies that require fixing, one that is automatically dropped, and one that requires your own, manual drop.

So, in the end, git merge does one merge, but git rebase does many cherry-picks, each of which is—internally—a merge, and each of which can result in merge conflicts. It's not surprising that rebases get more conflicts!


2 Technically, a plain (non-interactive) git rebase often doesn't use git cherry-pick . Instead, it uses, in effect, git format-patch ... | git am ... git format-patch ... | git am ... . Using git rebase -i always uses git cherry-pick , and git rebase -m forces a non-interactive git rebase to use git cherry-pick . The fact that plain rebase avoids it is mainly just a holdover from ancient (pre-2008-or-so, probably) Git, before cherry-pick was taught to do a proper three-way merge.

The git am step uses -3 , so that if a patch fails, Git will "fall back" to a three-way merge. The result is usually the same, but the format-patch-pipe-to-am method never finds renamed files. This makes the format-patch style faster , but not as good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM