简体   繁体   中英

git merge is overwriting main with the branch being merged

Backstory:

  1. My professor gave me a programming assignment with some template boilerplate code → I completed it and submitted it.
  2. My professor then added code to his original template code and told everyone to please redo the assignment.

I figured this would be a good opportunity to use a git merge:

What I did:

  1. I did git checkout -b linkedListUpdate
  • Then pasted his new code template over my original code.

  • Then I did git add.

  1. I did git commit -m "Added professor's update to a new branch"

  2. I did git checkout main

  • Then I made a minor adjustment to the code (I changed a random comment)

  • Then I did git add.

  1. I did git commit -m "separating branches..."
  1. FINALLY I try to merge them: git merge linkedListUpdate

What I expected: A bunch of merge conflicts for me to resolve to pop up.

What actually happend: linkedListUpdate overwrote what I had in my main branch leaving me with just my professors template code.

Side Question:

What is the better way to separate main from the other branch so that merge forces conflict resolution instead of a fast-forward?

(committing a comment change just to adjust the 'geometry' of the branches seems kinda wrong)

The only real error here is in your expectations: you expected merge conflicts, but there were not going to be any, and there were in fact none. It was definitely a good exercise for you though, as you've just hit on several very important questions!

There are a few background things you should start with, when setting up your own expectations here. Not all of these are strictly required but they help in terms of getting accurate mental images:

  • A Git repository is mostly a collection of commits and other supporting objects. I say "mostly" because there's also a collection of names (branch, tag, etc., names), which help Git (and you) find particularly-interesting commits, and when working with a usable repository—as opposed to a server-side one that you might find on GitHub, for instance—there's also an area in which you do your work, and then there's a whole host of smaller auxiliary items that are useful for all kinds of things.

  • A commit holds two things:

    • Directly, each commit has some metadata , or information about the commit itself.
    • Indirectly (not that you need to care about this part), the commit stores a full snapshot of every file , in a frozen-for-all-time format that only Git can read, and literally nothing can write. The file contents stored in this format are compressed and, importantly, de-duplicated . So if you make a commit, then change just one file and make another commit, it's true that both commits store all the files, but it's also true that the new commit has re-used the files from the earlier commit, except for the one you changed.
  • All objects, but especially commit objects—you'll rarely if ever deal with the other ones directly—have a hash ID . This is the key by which Git stores the object, in its big key-value database , and hence the key by which Git actually retrieves commits. Git needs the key to look up the commit. 1

  • A branch name , in Git, is a distinguished kind of name (kept in a separate namespace , apart from tag names for instance, so that you could have both a branch xyz and a tag xyz , though that's still a bad idea anyway). All of Git's names are stored in a second key-value database with the full name as the key—the full name of branch B is refs/heads/ B —and one hash ID as the value. You only get one hash ID, but that's all Git needs.

The graph that you in your image see has round dots representing commits, and labels in oblong (rectangular with rounded corner) boxes representing names. The names point to the commits by storing the commit hash IDs.

As a plain-text image, I would draw the same thing like this:

          J   <-- main
         /
...--G--H   <-- origin/main
         \
          I   <-- linkedListUpdate

where each of these uppercase letters stands in for a raw commit hash ID (we avoid trying to type these in as they're kind of unusable, eg, 9bf691b78cf906751e65d65ba0c6ffdcd9a5a12c ). The metadata for any one given commit, such as H here, contains the raw hash ID of the commit that comes right before it, eg, G . So these are actually backwards-pointing arrows:

... <-F <-G <-H   <-- origin/main

with the name origin/main giving us quick direct access to commit H , and commit H itself giving us (and Git) indirect access to commit G , which in turn gives us access to earlier commit F , and so on.

Git says that the commits that are reachable from a name and working backwards are "on" the branch. So commit H and earlier, here, is on all the branches. 2 Commit I is only on linkedListUpdate and commit J is only on main .

With all this in mind, let's take a look at git merge , and also at the distinction between a true merge and the fake, non-merge-y merges that you are already aware of and asking about in your side question.


1 There are maintenance commands that can (slowly and painfully) trawl through the entire database, but as these can take many minutes, you wouldn't want that to be the normal mode of getting-work-done.

2 Whether origin/main counts as a branch depends on who you ask and what they're thinking at this moment: in particular, the name origin/main is not a branch name , but rather a remote-tracking name living in the refs/remotes/ namespace. You can easily extract this particular commit, because it has a name to find it in one step, but you cannot get "on" origin/main as a branch because it's not a branch name .

Ultimately, the word branch is badly overused in Git, and it's often a good idea to quality exactly what you mean when you say "branch".


True merges

Given a starting setup like this:

          J   <-- main (HEAD)
         /
...--G--H
         \
          I   <-- linkedListUpdate

(the attached HEAD shows which branch you're "on", as in git status would say on branch main ), you run:

git merge linkedListUpdate

What happens? Let's start with the goal: The goal of a merge is to combine work. But now we have to think about this. What does work even mean?

Let's get a little more abstract and look at a situation where there are two or more commits in each "branch" that branches off from some common starting point:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

Each commit holds a full snapshot, so the only way we can see what changed is to run git diff or similar, to compare two commits.

We could compare commits J and L directly, but all that tells us is what's different . It doesn't say who did which kind of work. Suppose we added some lines to some files on "our" branch br1 , and they—whoever they are—added different lines to the same files on their branch. The diff from our commit J to their commit L will say to delete the lines we added and add the lines they added. That's not right!

Swapping the commits, so that we compare L and J directly, does not help: now we delete the lines they added, and add our lines. Clearly we need something fancier.

We could compare HI to see what we changed in I , then compare IJ to see what we changed in J . That at least gets us "the work we did". If there are many commits, we'd have a lot of individual comparisons to do here. But—hang on a minute!—every commit is a full snapshot of every file . What if we just compare H to J directly? We'll see "what we did", completely ignoring all the "noise" of how we get there, with intermediate commit I (and maybe a dozen more that we don't show).

The same trick works for comparing H to L , to see what they changed. That's the work they did. The only really hard part is coming up with the best shared commit , but in this case that's obvious: it's commit H . 3 Git calls this best-shared-commit the merge base .

Going back to your own more concrete case, we have:

          J   <-- main (HEAD)
         /
...--G--H
         \
          I   <-- linkedListUpdate

and we have Git compare H vs J to see what you did—change one line, probably:

changed a random comment

—and that's the change Git wants to take from "your side" of the operation. We have Git compare H vs I to see what your professor did. Then we have Git combine these two sets of changes.

You will get a merge conflict if:

  • you and he changed the same lines, in different ways, or
  • you made a change to a line, and he made a change to an adjacent line, so that your two sets of changes abut (touch at the edge).

But as long as Git is able to apply your change(s) to places the other branch doesn't touch, and vice versa, Git will be able to combine these changes on its own.

Having combined the changes , Git then applies those combined changes to the snapshot in the merge base commit H . The resulting snapshot is ready to go into a new merge commit , which I like to call M for Merge:

          J
         / \
...--G--H   M   <-- main (HEAD)
         \ /
          I   <-- linkedListUpdate

The only thing special about M is that, in its metadata, it lists two previous commits instead of just one. 4 As with any commit, the act of creating the new commit tells Git to update the current branch name so that the branch now points to the new commit. So now commit M is on main . Because M points backwards to two previous commits, though, suddenly commit I is also on branch main .

(This shows that branch , in Git, is kind of meaningless. And yet branch names are crucial since they're how we find the commit from which we work backwards. So branches are meaningless, and also very important. Well, that's Git for you.)


3 In the less-obvious cases, the answer is to use the Lowest Common Ancestor algorithm as extended for DAGs, which in this case also quickly finds H .

4 Technically, a merge commit in Git can list any number of parent commits as long as the number is at least two. A commit with one parent is a normal, boring, ordinary commit, and one with no parents at all is a root commit . The very first commit you make in a new empty repository is a root commit, and often that's the only one ever in all clones of that repository. Git's algorithms do all work with more than one root commit, but it's usually not a great idea to go about inserting extra roots.


Fake merges

We often find ourselves making a new branch name:

...--G--H   <-- main (HEAD), feature

and then making a commit or two on the new branch, with git switch feature (we're still on commit H , but we've changed which name we're using to find commit H ) to get:

...--G--H   <-- main
         \
          I--J   <-- feature (HEAD)

If we git switch back to main and run git merge feature , we find ourselves in this situation:

...--G--H
         \
          I--J   <-- main (HEAD), feature

What happened here? Why did the name main just move forward to point directly to J ?

Well, remember how git merge works. We start by finding the best (right-most, in my drawings) commit that is on both branches . In this case, that's commit H . Then we have to diff the snapshot in that merge base against the snapshots in each branch-tip commit, H and J respectively. But:

git diff <hash-of-H> <hash-of-H>

is never going to show any difference. The snapshot in H is the snapshot in H . There's no change here, by definition!

So, Git notices that the merge base commit is the current commit, and takes a short-cut. Git calls this a fast-forward merge , although there's no actual merging involved: Git checks out the contents of the other commit, dragging the branch name forward so that main and feature now both point to commit J .

You can prevent this by telling Git not to allow that. Simply run:

git merge --no-ff feature

and you get:

...--G--H------M   <-- main (HEAD)
         \    /
          I--J   <-- feature

where new merge commit M combines the work and therefore has the same snapshot as commit J , but has two parents as usual for a merge commit.

Even if there's just one commit, this preserves the "branch-y-ness" by adding a "merge bubble", as many call it. Using --no-ff is your choice: do it if you like, don't do it if you prefer the straight-line look.

What I expected: A bunch of merge conflicts for me to resolve to pop up.

What actually happend: linkedListUpdate overwrote what I had in my main branch leaving me with just my professors template code.

That behavior is correct. main has just one change to a random comment since linkedListUpdate branched. Unless linkedListUpdate also changed the same comment, there will be no conflicts.

This is called a 3-way merge and it works by comparing the tip of the two branches, plus their first common ancestor commit (that's "updated scripts..."). Any commits before the base are ignored.

              B [linkedListUpdate]
             /
X - Y - Z - A - C [main]

A is the common ancestor between main and linkedListUpdate. A 3-way merge looks at the differences between A and B, and the differences between A and C. If A/B has no changes in common with A/C there is no conflict and the A/B changes are automatically applied to C. If A/B and A/C both change the same lines there is a conflict and a human must decide what to do.

The previous commits (X, Y, and Z) have no effect on the merge.

What is the better way to separate main from the other branch so that merge forces conflict resolution instead of a fast-forward?

What you posted cannot result in a fast-forward. A fast-forward can only happen if there are no changes to main and no need for a merge commit.

This illustrates a fast-forward.

              B [linkedListUpdate]
             /
X - Y - Z - A [main]

$ git checkout main
$ git merge linkedListUpdate

              B [linkedListUpdate] [main]
             /
X - Y - Z - A

Instead of making a merge commit, Git simply moved main to the same commit as linkedListUpdate.

You can force a merge commit with git merge --no-ff . I recommend this as a merge commit preserves the branch in the history as a collection of related changes, and provides a place to record information about that branch such as a related entry in an issue tracker.

However, again, there is no conflict to resolve here.

If you want to manually verify the merge, use git merge -n . This will do the merge but leave the changes uncommitted. You can then examine and commit them. However, this is not normal procedure. Normal procedure is to let the automatic merge happen and only require intervention if there is a conflict. If the merge goes badly, it can always be undone .

What you actually want to do is an interactive rebase.

My professor then added code to his original template code and told everyone to please redo the assignment.

That sucks, and it's amongst the many reasons why I do not like template code for assignments.

How that could have been better handled would be to rewrite the original template commit using an "interactive rebase". Then Git will rewrite all the following commits using the new template. This will result in plenty of conflicts.

Use git rebase -i <template commit>~ . If it was the first commit use git rebase -i --root . This will bring up an editor with the template commit at the top. Change pick to edit , save, and quit. You can now edit that commit. Paste the professor's new template in. Add and amend the commit ( git commit --amend ), then git rebase --continue .

Amending a commit makes a new commit. Git will now rewrite every following commit on top of the new commit with the new template. This will result in plenty of conflicts for you to resolve, one commit at a time. This should make the job of adjusting your work to the new template easier, assuming you committed succinct changes.

See Rewriting History for an interactive rebase tutorial.

The issue here is that you based the professor's work on the wrong commit (possibly because the correct commit didn't exist). Basing it on your work implies that the professor undid all your work and updated the template, so that the merged version is just that template with your edited comment.

If you had done

git init foo; cd foo
git checkout -b templates
wget … -O code.c
git add code.c
git commit -m "Initial template"   # T1
git checkout -b main
emacs code.c
git commit -am "First assignment"  # A1
mail -s "Assignment 1" … <code.c
git checkout templates
wget … -O code.c
git commit -am "Updated template"  # T2
git checkout main
git merge templates                # A1'

which creates the commit graph

T1 --- T2
  \      \
   - A1 -- A1'

then there might very well have been merge conflicts preparing A1' (with no spurious comment-change commit). Even if there weren't, you would then have the right basis to continue

emacs code.c
git commit -am "Second assignment"
mail -s "Assignment 2" … <code.c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM