简体   繁体   中英

How to merge main and master branches?

I created a git repo about a month ago where the main branch was called 'master'. A couple days ago when I tried to commit and push to the same repo it pushed my changes to the 'master' branch, but I got a message saying the main branch has been changed to the 'main' branch.

I have tried merging but I get an error saying unable to merge unrelated histories (obviously they're going to have unrelated histories cause the 'main' branch was just created)

Now all my code is on the 'master' branch which is not the main branch, so I was wondering how I could move everything to the 'main' branch?

FYI: I did do a little research and I understand the whole reason why GitHub made this change, I just want to know how to figure this out.

The thing to realize about Git is that it is only commits that matter . Commits are what Git is all about. The commits themselves find the other commits, in a twisty little ball of commits, once you get into the commits. So: what are branch names good for? It's not nothing , but it's kind of close.

The real name of a commit is its hash ID. But commit hash IDs seem random, and there is no way to predict what the hash ID of some commit is. Once you find one commit, you can use that commit to find more commits. But you have to find one of them first, somehow—and that's where a branch name comes in. A name lets you get started . It gets you in to the nest of commits. From the name, you can now find the hash ID of some particular commit. That commit lets you find another commit, which lets you find still another commit, and so on.

Now all my code is on the 'master' branch which is not the main branch, so I was wondering how I could move everything to the 'main' branch?

The TL;DR here is that you're in a tricky situation and there is no single right answer. You will have to decide what you want to do. You can:

  • rename your own master branch to main and try to get all other users of clones of the original repository to use your commits; or
  • figure out how to combine and/or re-do some or all commits in the two repositories.

In other words, all you might have to do is rename the branch. But there is definitely still some problem, because right you now have two branch names. It's time to take a closer look at this whole thing: why is it the commits that matter, and how do these names really work?

Long

Let's start with the simplest form of related commits: a small, simple, linear chain. Suppose we create a new, totally-empty repository with no commits in it. There's a rule about Git branch names: a branch name must hold the hash ID of exactly one (1) existing, valid commit. 1 Since there are no commits, there can be no branch names.

To fix this problem, we make our first commit. If you use GitHub, they'll often make that first commit for you, creating one with just a README and/or LICENSE type file in it. Having that first commit allows you to create as many branch names as you like: they'll all store that one commit's hash ID.

Note that every commit gets its own unique hash ID. This hash ID is universal across all Git repositories everywhere. 2 This is why Git hash IDs are as big and ugly as they are. 3 It also allows Git programs to connect to other Git programs that are using other Git repositories, and figure out which commits each repository has, just by exchanging hash IDs. So the hash IDs are crucial. But they're quite useless to humans , who can't keep them straight. So that's why we have branch names.

There is one other thing to know about these hash IDs and the underlying objects (commits, and the non-commit objects that Git stores, mentioned in footnote 1): the hash IDs are simply fancy checksums of the stored object. Git looks up the object—the commit, or its related data— using the hash ID, but then also makes sure that the stored object's checksum matches what it used to look it up. So no part of any stored object, in Git, can ever change. If the checksum does not match, Git declares the storage to be corrupted, and refuses to proceed.

Anyway, let's say we started with one commit, one one branch named bra , and then created two more commits, so that we now have a tiny repository with just three commits in it. Those three commits have three big ugly hash IDs, unique to those three commits, but we'll just call them commits A , B , and C . Let's draw them like this. Each element in this drawing has a purpose:

A <-B <-C   <--bra

Commit C stores two things: a snapshot of every file, and some metadata. The snapshot acts as the main commit's data and lets you get back all the files, as of whatever form they had at the time you (or whoever) made commit C . The metadata include the name of the person who made the commit, their email address, and so on; but crucially for Git itself, the metadata in commit C include the hash ID of earlier commit B .

We say that commit C points to B . By reading out commit C , Git can find the hash ID of earlier commit B .

Commit B , of course, also contains data—a full snapshot of every file—and metadata, including the hash ID of earlier commit A . So from B , Git can find A .

Commit A is a bit special, because it was the first-ever commit. It has no backwards-pointing arrow leading to any earlier commit, as there was no earlier commit. Git calls this a root commit . It lets Git stop going backwards.

The commit we need to use to find all other commits, in this repository, is commit C . To find commit C , we use the branch name, bra . It contains the hash ID of commit C , so bra points to C , and that's how we get started.


1 There's no such thing as an existing but invalid commit. The point of saying "existing, valid commit" is really that hash IDs are used for more than just commits, so you could have a valid hash ID, but for something that's not a commit . But you won't be dealing with these non-commit hash IDs yet, if ever. You do have to deal with commit hash IDs, so those are the ones we care about.

2 Technically, two different commits could have the same hash ID as long as those two Git repositories never meet. A commit meeting its doppelgänger causes tragedy and sadness, so that's bad. (Well, technically, what happens is that the two Gits, as they're having Git-sex so as to exchange commits, simply malfunction. The sadness is in the users of those Gits, who expected some sort of beautiful baby.)

3 As of a few years ago, even this is starting to become insufficient. See How does the newly found SHA-1 collision affect Git? for details.


Adding new commits on one branch

Given that we have:

A <-B <-C   <--bra

we start by extracting commit C into a work area. The contents of each commit can't be changed, and that includes the stored files. 4 So now we have commit C "checked out". Git uses the name bra to remember the hash ID of C , and knows that the current commit has this hash ID.

We now make any changes we like: add new files, delete existing files, update files, and so on. We inform Git about these updates with git add . 5 Then we build a new commit with git commit . Git saves away the new snapshot, and adds the appropriate metadata, including the current commit 's hash ID, to produce a new commit D that points back to existing commit C :

A <-B <-C   <--bra
         \
          D

As the last step of git commit , Git stores the latest commit's hash ID into the branch name. Since commit D points back to existing commit C , we now want to start our view of the repository, via the branch named bra , by looking at commit D :

A <-B <-C <-D   <--bra

and the commit is now complete.


4 The files' contents are stored as blob objects inside the repository. This compresses them and de-duplicates them, so that when two commits share the same file contents, they literally share the internal objects. You don't normally need to know or care about this, though.

5 The git add step manipulates the thing that Git calls, variously, its index , or the staging area , or (rarely these days) the cache . To save space in this answer, I leave out all the useful details.


Multiple branch names

To use more than one branch, we normally add a new branch name, using git branch and git checkout , or combining the two with git checkout -b (or in Git 2.23 or later, git switch -c ). The way this actually works is that it just creates the new branch name, pointing to the same commit as the current commit:

A--B--C--D   <-- bra, nch

We now have two branch names but both select the same commit . Right now, it does not matter which name we use, because both names select commit D . But in a moment, it will become important—and Git always wants to be able to tell us which branch we're "on", so that git status can say on branch bra or on branch nch . To make that work, Git attaches the special name HEAD to one branch name, like this:

A--B--C--D   <-- bra (HEAD), nch

or this:

A--B--C--D   <-- bra, nch (HEAD)

Whichever name has HEAD attached to it, that's the current branch name . Whichever commit this name points to , that's the current commit .

Now we'll create a new commit in the usual way. It gets a new unique hash ID, but we'll just call it commit E , to keep our sanity: only a computer can handle the real hash IDs. Let's draw it in:

A--B--C--D   <-- bra
          \
           E   <-- nch (HEAD)

The branch name that got updated is nch , because that's our current branch . The current commit is now commit E , and that's the commit we have checked out.

If we git checkout bra , or git switch bra in Git 2.23 or later, we choose bra as our current branch and commit D as our current commit . So commit D becomes the one checked out:

A--B--C--D   <-- bra (HEAD)
          \
           E   <-- nch

Now any new commit we make will update the name bra :

           F   <-- bra (HEAD)
          /
A--B--C--D
          \
           E   <-- nch

This is the sort of branching we usually do, in a Git repository. Note that commits ABCD are on both branches , because no matter which name we start with, when we work backwards, we find all those commits. But the only way to find commit E is to start with the name nch . The only way to find commit F is to start with the name bra .

Branch names find commits

This is what branch names are good for. They find the starting —well, ending?—commit of the branch. In fact, that's how branches are defined, in Git. The name holds the hash ID of the last commit on the branch. Whatever hash ID is in the name, that's the last commit, even if there are more commits. When we have:

           F   <-- bra
          /
A--B--C--D   <-- main
          \
           E   <-- nch

there are three last commits, even though there are two commits after D . There are three ways to find commits ABCD , too: we can start with the name main and work backwards, or we can start with either of the other two names and work backwards.

How history relates

Suppose we have this:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

We can pick either of these two branch names—and hence either commit J or commit L —and then ask Git to merge the other last commit. Without going into any of the rest of the important details, the way Git handles this merge request is to work backwards to find the best shared commit , which in this case, is commit H . The merge then proceeds using commit H as the merge base .

This all works because the two branch tip commits, J and L , are related: they have a shared parent (well, grand-parent, in this case). This shared parent is a common starting point. They can therefore be converted to changes since the common starting point .

Changing a branch name is trivial

Each Git repository has its own private branch names . When you hook two Git repositories to each other, what really matter—because they can't change and uniquely identify the commits—are the commit hash IDs . So if we have:

A--B--C   <-- bra (HEAD)

we can just arbitrarily change this name to any new name we like:

A--B--C   <-- xyzzy (HEAD)

Nobody cares whether the name is bra or xyzzy or whatever—well, except for irrational humans, who have ideas pop into their heads when we use evocative names, like plugh or colossal-cave-adventure . And, when we're using Git clones to share work, we humans like to share our branch names too, to help keep our own sanity. So we don't normally go about renaming branches willy-nilly. But the actual names really don't matter, not to Git at least.

If this were your own situation—you have a master , they changed the name to main —you could just rename your master to main yourself, and you and they would both use the same name to find the same commits. This would be easy and simple. It's not your situation, though, because for this to be your situation, you would not be seeing that complaint about unrelated histories.

More than one root commit

All of the diagrams above have only one root commit: in our case, commit A . (Well, the ...--G--H probably has a single root commit.) But there are a bunch of different ways, in Git, to create extra root commits. One method is using git checkout --orphan (or git switch --orphan ). Suppose we start with:

A--B--C   <-- bra (HEAD)

and then use this technique to create a new root commit D , that doesn't point back to C , or to anything, named nch :

A--B--C   <-- bra

D   <-- nch (HEAD)

This works fine in Git, and we can go on and create more commits if we like:

A--B--C   <-- bra

D--E--F   <-- nch (HEAD)

What we can't do, now, is simply merge these two branches, because git merge needs to find the best common ancestor . Git does this by starting at each end and working backwards until the histories meet... and in this case, they never meet? One history ends (starts?) at A , and the other ends (starts?) at D , without ever coming across the same commit on both branches.

Multiple repositories

With all of the above in mind, let's add clones into the picture. Remember that each Git repository is, essentially, two databases:

  • One database contains commit objects, and other internal Git objects. Each object has a big ugly hash ID as its key, and Git looks up the actual values in a simple key-value datastore .

  • The other database has names—branch names, tag names, and other such names—each of which stores one hash ID. These hash IDs get you into the commits, so that you can find all the commits.

When you run git clone url , you have your Git create a new, empty repository, with no commits and no branches in it, then call up some other Git and have that Git look at some other repository, based on the URL you gave. That other Git has its two databases: commits and other objects (keyed by hash ID), and name-to-hash-IDs (keyed by names). They send, to your Git, all the objects, which your Git puts into your own database.

You now have all their commits, and none of their branch names .

In order to find these commits, your Git takes their branch names and changes them. Instead of, say, master or main , your Git makes up names like origin/master or origin/main . These names are your Git's remote-tracking names . They remember the hash IDs that their Git had in their branch names .

These remote-tracking names work just as well to find commits. You don't actually need any branch names at all, just yet. But git clone has not quite finished: its last step is to run git checkout (or git switch ), to pick some branch name for you.

Of course, you have no branches yet, but git checkout / git switch has a special feature: if you ask Git to check out a name that does not exist, your Git scans your remote-tracking names . If they have a master , you now have an origin/master , and when you try to git checkout master , your Git will create your own new master , pointing to the same commit as your origin/master . That, of course, is the same commit as their master !

This means you now have, in your own repository:

A--B--C   <-- master (HEAD), origin/master

Now, suppose they change their name master to main . If that's all they do—if they just rename their branch—you'll end up with this, after you run git fetch to get any new commits from them (there are none) and update your remote-tracking names:

A--B--C   <-- master (HEAD), origin/master, origin/main

Your Git adds origin/main to your repository, to remember their main . They have, in effect, deleted their name master , and your Git probably should delete your origin/master to match, but the default setup for Git does not do this. 6 So you end up with two remote-tracking names, one of them stale. You can clean this up manually with:

git branch -d -r origin/master

or:

git fetch --prune origin

(The git fetch has the side effect of updating all your remote-tracking names right then, including getting any new commits from them, so that's usually better. It takes longer though, as it has to call up their Git over the Internet, or wherever the URL goes.)


6 To make Git behave this way, for all your repositories, use git config --global fetch.prune true .


If they'd done that, things would be reasonable

Suppose they did do just that: rename their master to main , without actually adding or deleting any commits . Or, they might do the renaming, and then add more commits. Let's draw the latter: it's a bit more complicated but it all works out the same, in the end.

They had:

A--B--C   <-- master

and you ran git clone and got:

A--B--C   <-- master (HEAD), origin/master

in your own repository. (We can leave out the HEAD in their repository because we don't normally care which branch they check out.) Then they rename their master to main and add commits DE . You run git fetch and get:

A--B--C   <-- master (HEAD), origin/master
       \
        D--E   <-- origin/main

Your Git fails to delete origin/master , even though they have no master any more, so we leave it in the drawing. Note that it's harmless: it just marks commit C . We can delete it—we can set fetch.prune or run git fetch --prune or whatever—or leave it; it's not really important. Branch names don't matter. Only commits matter. Commit C is still there, whether or not there's a name pointing to it.

Anyway, perhaps you make your own new commit F :

        F   <-- master (HEAD)
       /
A--B--C
       \
        D--E   <-- origin/main

If you ask your Git to merge commits F and E , it works , because they have a common ancestor: F 's parent is C , and E 's parent's parent is C .

This tells us that this is not what they did.

What seems to have happened instead

If we assume that you did not make a bunch of unrelated commits, what must have happened, in their Git repository—over on GitHub—is that they made a new root commit, and used the name main to find it:

A--B--C   <-- master

D   <-- main

Then, they probably deleted their name master . That left them, in their repository, with this:

A--B--C   ???

D   <-- main

At this point—or just before it—they may or may not have copied some or all of their ABC commits to new commits that come after D :

A--B--C   ???

D--B'-C'  <-- main

Here, commit B' is a copy of commit B : it does to D whatever B did to A . Likewise, C' is a copy of C , doing to B' whatever C did to B . The new commits have new and different hash IDs and point backwards to commit D as their root, though. So when you run git fetch to connect your Git to their Git, their new commits are these D-B'-C' ones, so that you, in your repository, wind up with:

A--B--C   <-- master (HEAD), origin/master

D--B'-C'  <-- origin/main

If you delete your origin/master (since their master is gone), nothing really changes: your own Git is still finding commit C . Their Git can't find commit C —they may even have thrown it away by now; Gits eventually delete un-find-able commits—but your Git can, through your master . If you've made new commits since then, like the F we drew earlier, you even have this:

        F   <-- master (HEAD)
       /
A--B--C   <-- origin/master

D--B'-C'  <-- origin/main

You can't do a merge because these chains have no shared history.

So what can you do?

You are now faced with a bunch of choices. Which ones to use depend on how much work you want to do, how much work you want to make other people do, and how much control you have over the other Git repositories .

You can:

  • Keep using your commits (only) and force everyone else to switch.

    There was no reason to change the commits. The originals are still just as good as they ever were. Someone made a mistake, copying them. Make them eat their mistake: rename your master to main , use git push --force origin main , and make the GitHub (or other central storage server) repository use your commits, under the name main that everyone has agreed-to.

  • Copy the commits of yours that you like, adding them to the end of their last commit.

    Assuming that their commit C' has the same saved snapshot as your (and originally their) commit C , or whatever commit it is that's the last copy of an original, you can probably just add your work after C' , using git cherry-pick for each commit, or git rebase --onto to do multiple cherry-pick operations. See other StackOverflow questions for how to do that.

  • Merge with --allow-unrelated-histories .

    This technique can take the least time and effort on your part, but it could be messy and painful: the rebase / cherry-pick option in the middle may be faster and easier. All that --allow-unrelated-histories does is pretend that, before the separate root commits, there was a single commit with no files in it. In some cases, this works easily. In most cases, you get a bunch of "add/add conflicts" that requires a lot of manual work.

    It also has the rather ugly side effect of leaving extra, mostly-useless commits in your repositories, which you then carry around forever. If nobody looks at this history (and the two roots), nobody will care , but it's still there. Whether it bothers you (or others) is another question entirely.

There's no way I can pick one of these options for you, and this isn't necessarily the universe of all options, but by this point you should at least have a good understanding of what happened, and why these are ways to deal with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM