简体   繁体   中英

Why does merging branches sum total commits?

If Branch A has 5 commits and Branch B has 7, when I merge B into A, A now has 12 commits.

Is that the expected result? I wouldve thought a merge would be considered a single commit?

When you say "branch A has five commits", you're probably not counting all the commits that branch A contains. The same applies to your seven commits in branch B. To really understand this, it's important to realize that in Git, branches—or more precisely, branch names —don't actually have any meaning. It's only the commits that matter.

To see how this works, let's start with a really tiny repository with just three commits in it. The three commits have big long ugly Git-hash-ID names, but let's just call them commits A , B , and C , as if commits had single uppercase letters as their real names. (We'll run out pretty fast, which is one reason Git uses those big ugly hash IDs.)

The first big important secret of Git is that every commit stores its previous commit's hash ID inside it. Whenever you have the hash ID of a commit in your hands, we say that you're pointing to that commit. So our three commits go like this:

A <-B <-C

Commit C stores B 's hash ID, so C points back to B . B stores A 's hash ID, so B points back to A . A is of course the very first commit we ever made: it can't point any further back. It's a special case—a root commit, of which there's always at least one if the repository isn't empty. Usually there's exactly one root commit, with that one being the very first commit.

Branch names

The next big important secret is a simple follow-on to this first one, and that is that a branch name like master or develop simply points to one commit . The one commit that our master points to, in this case, will be commit C :

A--B--C   <-- master

I always get a bit lazy about drawing the internal arrows between commits, for various reasons. One is that once we make a commit, nothing and no one—not even Git itself—can change the commit. Commit C is frozen in time forever, always pointing back to B , which is frozen and points to A , and so on. The internal arrows therefore invariably point backwards . Git calls these the parents of the commit: the parent of C is B , and the parent of B is A .

The branch name pointers are different. Unlike the frozen contents of each commit, a branch name pointer can and does change.

Let's git checkout master , which extracts commit C into our work tree , giving us files we can see and work on / with. Then we'll make some changes, git add the updated files, and git commit to make a new commit that we'll call D . Git will package up our new files 1 and make this new commit D , pointing back to the commit we had out—ie, C —so that we now have:

A--B--C--D

and then as its final act, git commit writes D 's hash ID into the name master , so that master now points not to C but to D :

A--B--C--D   <-- master

This is how branches grow as you add new commits: each new commit points back to the one that was the last one in the branch, and then Git updates the branch name so that the name now identifies the new tip. Whenever Git looks for the history—for what happened over time—it works by starting at the last commit, the one pointed-to by the name, and working backwards, one commit at a time.

Making new branches

To make a new branch , what Git does is just add a new name pointing to some existing commit. Let's make branch branch-a now, in our four-commit repository:

A--B--C--D   <-- master, branch-a (HEAD)

Besides adding the name branch-a pointing to D , I've attached the special name HEAD —in all capitals, though you can use @ if you like a shorter name—to one of the two branch names. That's how Git remembers the current branch .

Before we make any new commits, answer for yourself: how many commits are there in master , and how many are there in branch-a ? If you didn't answer "four" each time, why not? If you ask Git, the answer is four: there are four commits, D then C then B then A , on both branches.

Let's add five commits to our new branch-a now, by changing stuff and using git add and git commit in the usual way. Git will construct five new, unique, big ugly hash IDs, but we'll call the new commits EFGHI and draw them in:

           E--F--G--H--I   <-- branch-a (HEAD)
          /
A--B--C--D   <-- master

When we made E , Git made it with parent D , and then changed the name branch-a to point to E . When we made F , its parent was E , and Git updated branch-a to point to F . We repeated this five times and we have five commits on branch-a that aren't on master, plus the four commits that are on both branches . So branch-a has not five but rather nine commits. It's just that five of them are only on branch-a .

Now let's make branch-b , by first switching back to master and then creating the new name branch-b , pointing to commit D :

           E--F--G--H--I   <-- branch-a
          /
A--B--C--D   <-- master, branch-b (HEAD)

Note that nothing else inside the repository itself has changed here. Our work-tree (and index) have changed—they've gone back to commit D —and we've added a new name branch-b that, like master , identifies commit D , but the commits are all undisturbed.

Now let's add seven commits that are unique to branch-b :

           E--F--G--H--I   <-- branch-a
          /
A--B--C--D
          \
           J--K--L--M--N--O--P   <-- branch-b (HEAD)

There are actually 11 commits on branch-b , but four of them are shared (with master , which I've stopped drawing out of laziness, and with branch-a ).

Merging

Now you want to merge branch-b into branch-a . So the commands you run will be:

git checkout branch-a
git merge branch-b

The first step chooses commit I as the current commit and branch-a as the name to which HEAD is to be attached. It copies the contents of commit I to the work-tree (and index / staging-area). There are no changes to the graph itself, but now HEAD indicates branch-a and hence commit I :

           E---F----G---H----I   <-- branch-a (HEAD)
          /
A--B--C--D
          \
           J--K--L--M--N--O--P   <-- branch-b

(I've also stretched out the top line a bit because of something I intend to draw in a moment. The position of the commits in the graph is stretchy because Git doesn't care about the actual time of the commit, only about the shape of the commits and their connecting arcs, and you can bend and twist the graph however you like, as long as you don't break any of the connections, or make up new ones that aren't there.)

The git merge command then does something a little tricky. First, it finds the merge base between the current commit I and the other commit P . The merge base is, roughly speaking, the point where the two branches diverged. In this case that's super-obvious from the graph: it's commit D .

Git now figures out what "we" changed on branch-a by doing:

git diff --find-renames <hash-of-D> <hash-of-I>   # what we changed

It gets a second diff to find out what they changed on branch-b :

git diff --find-renames <hash-of-D> <hash-of-P>   # what they changed

Git then combines the two sets of changes, applying the combined changes to whatever is in the snapshot in commit D .

This "make two diffs, combine them, and apply them to the merge base" process is the action form of merging. I like to refer to this as the verb to merge , ie, to combine changes. Because commits are snapshots, not change-sets, Git has to do the two diffs. In order to have a sensible starting point, Git has to find the merge base. That's why we have all this work that happens as part of the verb to merge when we merge commits I and P .

Merge commits

Now that Git has done all this to-merge work, Git will make a merge commit . Well, it will often or usually make one—we'll see the exceptions in a moment. Note that this uses the word merge as an adjective, though, modifying the word commit . We can also refer to this new merge commit as a merge , using the word merge as a noun. I like to refer to this as merge-as-a-noun or merge-as-an-adjective, to distinguish it from the process , the to merge verb. For the git merge command, we're doing the process first, then making the merge commit at the end. But let's draw it:

           E---F----G---H----I
          /                   \
A--B--C--D                     Q   <-- branch-a (HEAD)
          \                   /
           J--K--L--M--N--O--P   <-- branch-b

This new commit, merge commit Q , is special in precisely one way: it has two parents instead of one. It points back first to commit I , to say commit I was at the tip of branch-a a moment ago and is a parent of commit Q , but then it also points back to commit P , to say commit P is also a parent of commit Q .

If we now ask Git how many—and which—commits are on branch-a , Git starts at Q , then works backwards through both I and P , eventually arriving at D (to which master still points), and then all the way back to A . So the number of commits is now 17: A through D plus E through I plus J through P plus Q . If we ask how many commits are on branch-a that aren't on master , we get 13: five for E through I , seven for J through P , and one for Q .

There are lots of ways to draw this

Here's another way to draw what happened:

...--D--E--F--G--H--I------Q   <-- branch-a (HEAD)
      \                   /
       J--K--L--M--N--O--P   <-- branch-b

The number of reachable commits remains the same, though: Git starts at Q , moves back to both I and P , moves back to both H and O , and so on until reaching D when it moves back to whatever comes before shared commit D .

If you have git log draw the graph, using git log --graph or git log --graph --oneline , Git will draw it vertically, with commit Q at the top and the branching structure represented as individual lines:

* hashofQ (HEAD -> branch-a) Merge ..
|\
| * hashofP commit message for P
* | hashofI commit message for I
...

or similar—the exact position of each * and line depends on additional sorting options you may pass to git log such as --author-date-order , though --graph always enforces at least the --topo-order option. Graphical viewers such as gitk , and various GUIs, may mimic git log --graph --oneline but make it all prettier (though as always, beauty is in the eye of the beholder).

Squash merge: git merge doesn't always merge

The git merge command can do more than build a merge (noun) using the to merge (verb) process. arkus mentioned git merge --squash , which does the to merge part of the process, but then simply stops, without making a commit and without recording the fact that the next commit should be a merge. In this particular case, we'd then run git commit ourselves to make commit Q . New commit Q would be an ordinary commit , not a merge commit, and we might draw it in like this:

...--D--E--F--G--H--I--Q   <-- branch-a (HEAD)
      \
       J--K--L--M--N--O--P   <-- branch-b

Because there is no connection between Q and P , someone coming in later—including yourself, or Git—and looking at this graph may have no idea that commit Q is the result of a merge. The seven commits that are exclusive to branch-b are still exclusive to branch-b . In general, if you have done this, you should immediately remove the name branch-b from this repository and from every clone of this repository , so as to utterly forget that commits JKLMNOP ever existed.

This is sometimes, but not always, a viable, useful, and good work-flow. It's particularly useful when the individual commits on branch-b have never been seen anywhere else, so that you know nobody else has them, and you only made them as temporary commits with the intent to replace them all with a single "add the feature" commit, ie, commit Q , at the end. After doing the squash merge, you force Git to delete your branch-b name and you forget that you ever did any of the individual commits. You have one final good commit and you pretend to the world that you knew how to make that commit all at once.

Sometimes, though, even if you're introducing a feature, it's good to keep it as a series of separate commits. In particular, what if you've introduced a bug too? In that case, if you shrink your feature down to a series of simple but clear commits—let's say three of them—and then you merge them with a real merge, you get a graph like this:

...--D--E--F--G--H--I--Q   <-- branch-a (HEAD)
      \               /
       R-------S-----T   <-- branch-b

If it now turns out that you have introduced a bug, it's probably possible to check out commits R and S and T and see which of those commits introduced the bug . Then you can compare R vs D , S vs R , or T vs S , to help you find out how the bug got in, and figure out what to do to fix it.

What this boils down to is that squash merges aren't bad, they're just a tool. Use your tools to do things in a way that will make life easier for yourself in the future. If that means squashing, go ahead and squash. If not, don't.

Fast-forward: git merge doesn't always merge

We should also cover fast-forward operations. Consider a situation in which you make a branch:

...--C--D   <-- master, feature (HEAD)

You then make some commits on that branch:

...--C--D   <-- master
         \
          E--F--G--H   <-- feature (HEAD)

Everything seems great and you'd like to introduce the feature now, keeping all four of these commits intact. If you now run:

git checkout master
git merge feature

Git will say something about fast-forward , and you will be left with this graph:

...--C--D--E--F--G--H   <-- master (HEAD), feature

The name feature has not moved—it still points to commit H —but the name master has moved, and now also points to commit H . There's no new merge commit!

What Git did here is that it did the merge-base finding just as it would for a real merge, and found that the best common commit between master and feature was commit D . The name master pointed to commit D , though, so if Git were to do the usual to merge verb, it would run:

git diff --find-renames *hash-of-D* *hash-of-D*   # what we changed

and the answer would, of course, be we changed nothing! Then Git would need to diff D vs H to find out what they changed, which of course would be whatever they changed. Git would apply those changes to D and get ... commit H , again.

If Git made a real merge out of this, it would look like:

...--C--D------------I   <-- master (HEAD)
         \          /
          E--F--G--H   <-- feature

The snapshot for commit I would match that for commit H .

You can force Git to make this merge commit:

git checkout master; git merge --no-ff feature

That way you get the same kind of true merge you would have gotten had master had some commit after D . You can do this if you want to emphasize to a future viewer—who may well be yourself in a year or two—that commits EFGH were made as a group, and together they implement some feature. Or you may not care: you, and future-you a year from now, may prefer to just see commits EFGH as a logical extension of master , without any need to remember that these four were done specifically for some particular feature.

Again, this really boils down to the fact that fast-forward merge vs real merge is a tool, which you can use to communicate information to future users of this repository. Use your tools to arrange things to make the life of future-you easier.

f you think you'll prefer to see the merge in a git log --graph or graphical viewer, force the non-fast-forward merge with git merge --no-ff . If you think you'll prefer not to see the merge, you can even use git merge --ff-only to make sure that Git will just fail if a true merge is required (after which you will need to do something different, and that's beyond the scope of this already-too-long answer).

It depends on the history of the branches... it could be between 7 revisions (a fast-forward) and 13 revisions (merging 2 totally unrelated stories). It all depends on the stories and how many diverging revisions you are talking about (or if you are forcing a --no-ff ). One possible way to get 12 revisions is to have a single common ancestor between both branches, so you have the common ancestor, 4 revisions on one branch and 6 on the other (after the common ancestor) plus the merge revision: 1 + 4 + 6 + 1 = 12. But as I said, it all depends on the history. 8 could be achieved by having all 5 revisions of one branch be the first 5 revisions of the other branch and then do merge --no-ff. That will create a merge commit for what would have been a ff. Result: 8 revisions. With 4 common ancestors and merging you get 9 revisions... and so on.

If you want to merge the branch with 1 commit, you could use --squash option fo git merge .

What it does is it creates one commit from the branch passed in git merge --squash <branch> , which you can commit.

Default git merge branch :

git merge --squash branch :

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM