简体   繁体   中英

Is it possible to merge only one sub-repository (subtree) from main repository?

Assume having a MainRepo that includes subtree repositories: subRepoA, subRepoB, SubRepoC. If I made changes in all of my repositories but would like to merge and push only changes that were done in subRepoB. Is it possible? It seems MainRepo behaves like one big repository without possibility to distinguish between its sub-repositories.

The answer here is both no and yes. That is, you can achieve what you are asking for, but:

  • not with a single simple git merge command (it will require additional commands); and
  • it's often a bad idea . Be careful. You may regret it later, However, if you read through all of the below, and think about how merging works, you can do it and can figure out how to update later if necessary.

To do it, though, use:

git merge --no-commit

and then use git checkout or (since Git 2.23) git restore to "undo" some of the merge. Then finish the merge with git merge --continue or git commit . See the details below for more information.

Background

To understand how you this all works (and why it's a bad idea), remember this about Git: Git is about commits. Git is not about files , and not even about branches . It's true that commits contain files—that's why we have commits, to hold files—and branch names find commits, which is why we have branch names. But in the end, Git is all about commits .

  • Commits are numbered. These aren't simple counting numbers: we don't start with commit #1, then have #2, #3, and so on. Instead, each one has a random-looking (but actually not random at all), unique hash ID , which is shown as a big ugly string of letters and digits, often abbreviated since humans will generally just kind of blip over them (is dca3c76df9bb99b0... the same as dca3c76dfb9b99b0... ?).

  • No part of any commit can be changed, once it is made. The reason for this is that the hash ID is actually a cryptographic checksum of every bit of the commit. If you do take one out, make some changes, and put it back, what you get is a new commit with a new and different hash ID. The old commit, with its unique number, is still there, and anyone who looks up the number gets the old commit.

  • Each commit stores two things:

    • There's a full snapshot of every file that Git knows about. The files are stored in a special, read-only, Git-only, compressed and de-duplicated format. (The de-duplication immediately handles the fact that most files in most commits are exactly the same as the versions of those same files in a previous commit.)

    • Meanwhile, each commit stores some metadata , ie, information about the commit itself. This includes who made it—name and email address—and when, and their log message to explain why they made it. In this metadata, Git stores something that Git itself needs: the commit number of the commit that comes before the commit we're looking at here. Git calls this the parent commit .


    The fact that each commit stores the number—the hash ID—of its parent means that, if we can just find the last commit in a string of commits, Git can use that to work backwards. That is, suppose we use single uppercase letters to stand in for the actual hash IDs, and draw the following:

     ... <-F <-G <-H

    Using hash ID H , Git can retrieve the actual commit—including the snapshot—that you made, whenever you made it. So that gets you the files. It also gets Git the metadata, including the hash ID of earlier commit G . This means Git can extract both commits and compare the files in G to those in H , to show you what you changed in H . Git can also print out the name and email address of the person who made snapshot G , and use G 's metadata to find commit F . Comparing the snapshot in F to that in G , Git can show you what changed in G , and Git can go back to even-earlier commit F , and so on.

    Of course, we have to somehow find the hash ID of commit H .

  • A branch name like master or develop simply holds one (1) hash ID. But as long as we—or Git—make sure that this is the hash ID of the last commit in the chain, we're all good:

     ...--F--G--H <-- master

    Making a new commit requires that Git store the new commit's hash ID into the branch name:

     ...--F--G--H--I <-- master

    Once we make commit I (while we're using master as the name), Git will automatically update master so that it points to the last commit. The parent of new commit I will be existing commit H .

    Since the "arrows" from each commit, pointing to its parent, are part of the commit, they can't be changed. Like everything in a commit, these are purely read-only. Note that the arrow coming out of the branch name does change, though. So that's why I keep drawing that arrow, while turning the commit-to-commit arrows into simpler lines: we just have to remember that commits point backwards , and Git works backwards .

  • Commits can be on more than one branch at a time. For instance:

     ...--F--G--H <-- master, develop

    Here, both names identify commit H as their last commit. So all the commits are on both branches.

    The technical term for this is reachability . We're just going to use this lightly below, in merges, but think about starting from commit H and working backwards, one commit at a time. Without moving, we've reached commit H . We move back one step, and we've reached commit G . Move back two steps and we're at commit F , and so on.

  • Note that Git can compare any two commits, not just a parent and child pair. We put the earlier commit on the left (well, normally anyway) and the later commit on the right. Git then compares the two commits' snapshots. For files that are the same, Git says nothing at all. For files that are different, Git figures out some set of changes we can do: add these lines after line 42, and remove line 86 This is a diff: it shows how to change the left-side file into the right side file.

    If we compare parent and child, this diff listing is usually what we did. But note that Git will just find a set of changes. In some cases, that's not quite how we changed it. The diff Git finds will work, even if we did things a little differently—but sometimes (see merging below), this can cause minor but annoying merge conflicts, that wouldn't happen if Git did a better job here.

  • When we use git push (or git fetch and hence therefore also git pull ), Git works with commits .A push operation sends whole commits . This includes both the snapshot and the metadata. The two Gits know which commits each other has by just comparing those hash IDs: this is why the hash IDs are cryptographic checksums of the commits. Each Git either has a commit, or doesn't. Whichever Git is sending commits offers the hash ID to the receiving Git, which either says "yes, I need that one, send it" or "no thanks, I already have that one".

git merge will merge commits and make a merge commit

The git merge command itself merges commits . We like to use it with branch names. That is, we start out with something like this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Because we have two names in this diagram, we need to remember which name we are using. That's where the special name HEAD comes in: we attach it to whichever branch we've told Git to use with git checkout or (since Git 2.23) git switch . That's the name that will be updated when we make a new commit.

So, now we run git merge branch2 . Git uses the name branch2 to find one specific commit: the one the name points to. In this case, that's commit L . So two of the interesting commits are commit J , the one we are using right now, and commit L , the one we named on the command line.

The merge operation, however, actually requires three commits. The third one—or in a way, the first one—is whichever commit is the best common ancestor of the other two. You can think of this as Git looking at the two commits we've already named— J and L here—and working backwards. We'll move back as far as we need to, from both commits, until we find some commit we can find from both commits.

In this case, the best shared commit is obvious: it's commit H . Commit H is on both branches. Commit G is too, but it's further back, so H is the best one.

To actually accomplish a merge, Git will now diff the merge base—commit H —against our current commit, J , to see what we changed:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed

Then Git will diff the same merge base against the other commit we named:

git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

The heart of git merge —and what I like to call the verb form, or to merge —is now the process of combining these two diffs . Git has found the common starting point, and found two sets of changes: "ours" (from the HEAD / current-branch commit), and "theirs" (from the commit we named on the command line). As long as we and they changed different files or different lines within the same file , 1 Git itself will be able to do this combining on its own.

Git will repeat this for all files. Git will apply the combined changes to the snapshot from the merge base (here, commit H ), and if there are no conflicts, Git will make a new merge commit on its own. This is what I call merge as a noun , as the adjective merge in front of the word commit is often used as a noun, "a merge".

To prevent Git from making this commit on its own, we'll use --no-commit . If we didn't, Git would still stop in the case of a merge conflict (and then you'd have to resolve the conflict before committing).

Before we go on to show how to undo part of the merge, let's pretend we finished the merge as normal, or left out --no-commit , so that we get the final merge commit. Let's draw it in:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Note that the name branch1 has been updated as usual. It now points to new merge commit M . What makes M a merge is simple: it has, instead of the usual single parent commit J , two parents. Git adds commit L as the second parent of the new commit.

The true significance of this new second parent will become clearer in a moment, but note that we're now able to reach both commits K and L from the name branch1 and commit M , by going down-and-left. So now all the commits are on (reachable from) the name branch1 , while commits I and J are not on branch2 : they're not reachable from branch2 because the last commit on branch2 is commit L , whose (single) parent is K , whose (single) parent is H . From H we can only go backwards to G , and then to F and so on.


1 If we both change (say) line 42, in different ways, Git will not know whether to use our change, or their change, or something different. Here Git will declare a merge conflict , and stop in the middle of the merge, with the merge unfinished. Your job becomes that of telling Git what the final result should be.

Git will also stop even if our change and their change simply abut (touch): if we replace the old line 42 with a new line 42, and they replace the old line 43 with a new line 43, Git will declare a merge conflict here as well. This is particularly helpful—yet also particularly annoying—with changes at the top of a file, or at the end, because Git doesn't know which order to put those changes in. For instance, if there is a 10-line file and we add an 11th line and they add an 11th line, which line goes first? Which line becomes line 12? Git itself doesn't know, so it makes whoever is doing the git merge provide the right answer.


Using (or abusing?) --no-commit

When Git makes the snapshot for merge commit M , Git does so in the same way it does for any commit. We haven't talked about the role of Git's index or staging area here—and for length reasons, we won't—but the point is that new commit M will have a snapshot, just like any other commit. We can, with git checkout or git restore , or just by editing the working tree copies of files and using git add , change what goes into commit M .

So, if we run:

git checkout branch1
git merge --no-commit branch2

and Git thinks it is all done but hasn't made the merge, we can now make particular files—such as every file in some directory—match the copies of those files in the HEAD (ie, current, ie, J ) commit:

git checkout HEAD -- subdir2 subdir3

This will replace, in Git's index and your working tree, all the copies of the files in subdir2/ and subdir3/ with those from the HEAD snapshot. Or:

git restore -iw --source HEAD subdir2 subdir3

which does the same thing.

If you now run git merge --continue or git commit , Git will now make the snapshot for M from the merged files as updated by this step . You will get the same commit graph as before:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

What's different is that the snapshot in commit M now matches the snapshot in commit J , except for the files that you didn't restore, which now contain the merge that Git made automatically, using H , J , and L as the three input commits.

Note that nothing has changed in the three existing input commits. Nothing can change, so nothing did. This means that you can, if you like, re-do this same merge later, with or without --no-commit . Because all commit hash IDs include the time-stamp when computing the cryptographic checksum, a new merge, if and when you make it, will have a different hash ID than existing merge commit M . You may wish to make use of this fact later.

Commits are the history in a repository

Now that commit M exists:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Git will, in essence, believe that commit M is the correct result of the merge. Let's add some more commits to branch1 and branch2 , in the usual way ( git checkout or git switch , plus the usual work), then get ready to merge branch2 into branch1 again:

          I--J
         /    \
...--G--H      M--N   <-- branch1 (HEAD)
         \    /
          K--L--O--P   <-- branch2

If we run git log we'll see commits N , then M , then—in some order— J and I and L and K —and then H , then G , and so on. If we run git log branch2 , we'll see commit P , then O , then L , then K , then H , then G , and so on. This is because these are the reachable commits from each branch-tip commit. When traversing backwards through M , Git will visit both legs of the branch: 2 note that when viewed backwards, a merge is, in effect, a branch (and a branching split, where H split into streams I and K , is a merge).

In any case, if we now run:

git merge branch2

again (with or without --no-commit ), Git will go through the usual process to locate the two branch-tip commits N and P and then work backwards to find the best shared commit as the merge base. In this case, that best shared commit is commit L : two steps back from N as long as we go down at the fork, and two steps back from P as well. 3

Git will now do the usual diffing, from L to N to see what we changed, and from L to P to see what they changed. If we've used git checkout or git restore to make files in merge M match those in J , "what we changed" is to put our stuff from J back, and "what they changed" is often nothing at all , as the snapshots in O and P , on branch2 , won't have to make any changes to keep their code.

What this means is that by telling Git that the right way to merge J and L is to keep the files from J , Git will continue to believe that this is the right way to do the merge.

Note that if you re-perform the merge of J and L (by checking out either commit as a historical commit, or making a new branch name, and then merging the other commit), Git will still re-do the same work it did the first time we merged J and L . That is, this time , Git will again combine the files that you put back manually. It's when we do the merge of N and P , which both have the M commit in their history, that Git will "see" the merge we did earlier.


2 This helps show why the word branch is problematic in Git. If we want to be precise, we should use the phrase branch name when talking about names like master , branch1 , and branch2 . Structural branches—where H forks if you're reading forwards, or M forks when Git is reading backwards—don't have great names. I like to call them DAGlets : see my answer to What exactly do we mean by "branch"?

3 The fact that it's been 2 steps back each time, on both "legs", is a sort of coincidence forced by me trying to draw pretty graphs. Often it's a different number of steps on each leg, and in some cases, it's no steps back from one or both commits. However, when going no steps back, the merge is either trivial (and Git will do something else—not actually merging—by default) or already done (and Git will just say that you're up to date and do nothing).


Summary

  • The merging action—the to merge part of git merge —merges commits . That is, it looks at the snapshots in each commit.
  • The merge process uses the history, which is a result of the graph recorded by earlier commits (including merge commits), to find the merge base.
  • You can deliberately pause Git after the merge-as-a-verb part and make changes.
  • When you finish this part, and use git merge --continue or git commit to make the merge-as-a-noun, the resulting snapshot will be whatever you made it to be, while Git was paused.

This is how you can achieve what you want. Since you're dealing with git-subtree elsewhere (I assume), the fact that this makes a later merge "harder" in some sense may be irrelevant: if you need updated subdir2 files, you can just git checkout or git restore -iw them from the appropriate commit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM