简体   繁体   中英

Git create branch that is equal to master of fork parent

I want a branch of my fork to be equal to the master branch of the fork parent. However, the master branch on my fork is ahead of the parent, and I would like to make changes on a branch before getting rid of the changes on my master branch. How can I do this?

Let's define a term here (though GitHub do actually define it themselves): the "fork parent" is the repository you were browsing when you decided you wanted to create a GitHub fork. You clicked their "fork" button and now you have, under your account, a repository that is a clone of their GitHub repository.

Since a GitHub fork is a clone—albeit one with added features—we now have two repositories. Both repositories are ordinary Git repositories. They just reside over on GitHub. We can, if we wish, clone either or both repositories to our own laptop or other computer. Let's hold off on doing that yet, so that we only have two Git repositories to deal with for now.

At this point, having pushed the "fork" button, these two clones are identical (except for the added features, but those lie outside Git proper). A Git repository consists primarily of two databases:

  • There is a database of all "Git objects": commits, mainly, and their supporting objects, plus some stuff for annotated tags. These are all read-only: none can ever be changed. The contents of these objects, particularly the commits, are of interest to humans. There's one problem though: the contents are found by their hash IDs , big ugly numbers written in hexadecimal , that are useless to humans. They look random (though they aren't random at all) and they are unpredictable.

  • Separate from the database of objects, there's a database of names: branch names, tag names, and other names. These names are human-readable and often actually mean something to humans. What's in the database-of-names are the hash IDs of the commits (and the annotated tag data, for annotated tags).

A regular clone, like the one we would make on a laptop, copies all the objects and none of the branches . But the clone made by a GitHub fork button copies both . So right now, our fork has the same set of branches as their fork. So at this point, everything is equal .

Over time, though, they get unequal, because Git repositories generally have commits (and other objects) added to their commit-and-other-objects databases . When that happens with our fork (our GitHub repository), our repository gets "ahead of" theirs. When that happens with their fork, their repository gets "ahead of" ours. When it happens to both, they diverge. This is entirely normal and natural.

You would like to resolve this divergence somehow. But you have conditions:

However, the master branch on my fork is ahead of the parent, and I would like to make changes on a branch before getting rid of the changes on my master branch. How can I do this?

At this point, we need some kind of understanding of what a branch is . Saying branch X is ahead of branch Y or branches X and Y have diverged is all well and good, but what does that really mean? If a Git repository is two databases—and it is—what has actually happened in the two databases?

We already know part of the answer, based on what we said above: some new objects went into the objects database(s). Objects rarely, if ever, get removed from an objects database (and if they do it's all automatic). Git is built to add commits, not take them away.

Given this idea of never removing commits, only ever adding them, what's really critical to the humans is not the commits themselves—though for Git itself, that's all that's really critical—but rather, which commits we can find, and how . And for that, we need the branch-and-other-names database.

The first key to understanding this is that each commit records the hash ID of some earlier commit(s) . So from a commit, we can go backwards, to older commits. But no commit ever records the hash ID of any newer commit. That's mainly because we build these things one commit at a time. Each one gets a new, unique, random-looking and unpredictable hash ID. We don't know what the future hash IDs will be: we only know what the past ones were . So a child commit—one that's new, made from some parent (older) commit—is allowed to know the hash ID of its parent, which already exists. But the child does not yet know what children, if any, it will have once it's grown-up. And once the child is born, no part of it can be changed, so it can't record its children.

This gives us a backwards-looking chain: the most recent child, whatever it is, remembers his parent, who remembers her parent—the grandparent of the most recent child—and so on:

... <-F <-G <-H

where each uppercase letter stands in for some random-looking hash ID. But if that's true (and it is), there's still one problem: How will we find the hash ID of the most recent commit?

This is where branch names enter our picture. A branch name just records one hash ID. That one hash ID is the ID of the most recent commit on the branch . So the picture really looks like this:

... <-F <-G <-H   <--main

The name main holds the hash ID of the last commit H . That commit holds a snapshot of all files, and the hash ID of its parent commit G . Commit G holds a snapshot of all files, and the hash ID of an earlier commit F . This repeats for every commit, until we get back to the very first commit ever. That commit—called a root commit —has no parent, because it can't (have a parent). That's also how Git knows to stop going backwards.

To add to a branch , we start by extracting the latest commit:

...--G--H   <-- main

We extract the contents of commit H . We do some work with that and then make a new commit. The new commit gets a new, unique hash ID—which is unique across every Git repository everywhere in the universe (this is why the hash IDs are so big and ugly)—and that commit and its supporting objects go into the big database of objects:

...--G--H   <-- main
         \
          I

Git makes sure that new commit I , before it's created and thus acquires its hash ID, has H 's hash ID stored in it, so that I will link back to H . Git knows to use H 's hash ID because the name main still points to H . But now that I is in the big database, Git does its last little trick: it writes the hash ID of I into the names database, under the "branch name main" entry. So now we have:

...--G--H
         \
          I   <-- main

and we can draw this as a straight line if we prefer:

...--G--H--I   <-- main

At this point, the two clones have diverged: one has GHI with its main pointing to I , and the other has GH with its main pointing to H . The divergence is that one is strictly ahead of the other. Which one is "ahead"? The one we added commit I to, of course. The other one is "behind".

If we, or anyone , now add some new commit to the "behind" repository, that new commit gets a new, totally-unique hash ID. Let's call it J for short though. So now we'll have:

Repo 1:

...--G--H--I   <-- main

Repo 2:

...--G--H--J   <-- main

Note how, in each repository, the name main selects the last commit. The last commits differ in the two repositories, and both are somehow both ahead of, and behind, the other.

If we want to synchronize the two repositories, we have a big problem. Let's say we wish to synchronize Repo 2 to Repo 1. We can do that, by first grabbing commit I from Repo 1 and shoving it into Repo 2. Because the hash IDs of commits are totally unique, H is the same in both repositories but I and J are different so now Repo 2 has:

          I   ???
         /
...--G--H--J   <-- main

The fact that we've drawn commit I on a separate line is not important (for the same reason we were able to move it around earlier). But the fact that the name main does not point to it, is important, because it's that name that we want to use to find I . If we make Repo 2's main point to I , we get:

          I   <-- main
         /
...--G--H--J   ???

which we can draw as:

 ...--G--H--I   <-- main
          \
           J   ???

if we like, but no matter how we draw it, we can't find commit J any more .

Rescuing commit J

What if we make a new name to point to commit J ? Let's start Repo 2 off as before, without I but with J as the last commit on main . In fact, let's put several commits in Repo 2 that aren't in Repo 1:

...--G--H--J--K--L   <-- main

Let's add a new name that *also points to commit L now:

...--G--H--J--K--L   <-- main, extra-name

This extra-name is a branch name , so like any branch name, it points to some commit. We chose L as the commit we want it to point-to. Why did we choose L ? That's obvious, isn't it? It's the name main points to.

Now that we've done this, let's grab commit I from Repo 1, and change our existing main to point to commit I , just like their main :

...--G--H--I   <-- main
         \
          J--K--L   <-- extra-name

Now we can find commit L . Commit L points back to commit K , which points back to commit J , which points back to commit H . None of our commits have changed at all. The only thing that did change is the name main , which now points to commit I , which we got from their repository.

The mechanics of doing all of this

The problem with using GitHub to do this is that GitHub has an inflexible, limited interface: the web interface. It's pretty good for what it does—for accessing all the added features that GitHub forks give you—but it's not very good at doing the basic stuff that command-line Git does. Hence, it turns out that the way to do this is with command-line Git.

The one problem with this is that command-line Git is slightly different. We have a way to copy a repository, using git clone . But this copy operation is different from GitHub's "fork a repository" clone. What we do is run:

git clone ssh://git@github.com/user/repo.git

on our laptop or whatever kind of computer we have. (I've used an ssh:// URL here; you can use https:// if you prefer, though GitHub are starting to push people towards using ssh.) This:

  • makes a new, empty repository;
  • uses git remote add to add the URL under a short name by which we can refer to the GitHub repository later: the default short name is origin ;
  • copies all of the commits that exist in this repository, but does not copy the branch names ; and then
  • does a git checkout , which creates one branch name in our new local repository.

(We then have to cd repo or otherwise enter the new repository, because git clone can't make our command-line-interpreter do that for us. So we'll do that too.)

The interesting thing here is that our Git doesn't copy their branch names. That's because our Git wants to let us make up our own branch names, which need not match theirs at all. We will probably want to have at least some of our branch names match at least some of theirs, but our Git chooses not to force this on us—even if it would make sense. Our Git is tuned to "power user" mode right from the start, even if we're Git beginners. 1 What our Git does with their Git's branch names is to change them into remote-tracking names . 2

These remote-tracking names are formed by taking their branch names and shoving origin/ in front of them. Technically, we shove in whatever name we used for the remote in the git remote add step—and the actual full name is refs/remotes/ remote / name . And, technically, your own (local) branch names have refs/heads/ shoved in front of them . But some or most of these prefixes don't normally show up:

  • If you run git branch you'll see your (local) branch names shortened to things like main or master .
  • If you run git branch -r , you'll see your remote-tracking names shortened to things like origin/main .
  • But for some reason, 3 if you run git branch -a , your remote-tracking names are shortened to remotes/origin/main instead.

You can, however, run git for-each-ref , which finds all refs. Branch names and remote-tracking names are just two forms of reference. Tag names are a third form, and Git has a bunch more forms. The for-each-ref command, which isn't really meant for ordinary users, lists all of them, printing out their full names by default, and the objects—well, hash IDs and types—they point-to.


1 This was almost certainly a really bad idea. Unfortunately, it's too late to change it now: Git has a "don't break existing users' work-flows" philosophy.

2 Git calls these remote-tracking branch names . The word branch here is redundant, and if you use this phrase, you will be tempted to shorten it to remote-tracking branch , which is sort of OK, but then you will be further tempted to shorten to remote branch , which... is not OK: it's confusing. Do you mean branch name but as seen on the remote here, or do you mean name, as found in local Git repository, under the group remotes/origin/ ?

3 I have no idea why. If this were showing tags, too, it might make sense. But it isn't.


Mechanics, part 2

Now that we have a clone of our fork, we need to add to this clone, any commits they have that we don't. That means we need to instruct our Git how to call up GitHub and read from their fork too.

To do this, we need a second remote. This second remote needs a name. There's a standard second name—though I don't like it much myself—of upstream . You can use that one, or invent one you like better. For this answer I'm going to use upstream here:

git remote add upstream <url>

For the URL, put in an ssh://git@github.com/them/their-repo.git or https://github.com/them/their-repo.git or whatever URL one would use to read from their Git repository over on GitHub: the one you used when you used the FORK button, perhaps turned into an ssh URL.

(You can, if you like, run:

git ls-remote <url>

first to see if you have the URL correct. This has your Git call up that URL and get information about their branches, tags, and other names. It is a lot like running git for-each-ref , except that it uses the Git protocol to read from some other Git.)

Once you have this set up, run:

git fetch upstream

to have your Git call up their GitHub repository. This starts with the same thing as the git ls-remote —in fact, you can now run git ls-remote upstream —but then does more: it gets, from them, any commits that they have, that you don't have, and adds them to your repository, locally, on your laptop or whatever. Then—this is the Power Git User thing I mentioned earlier—it creates, in your Git repository, remote-tracking names for each of their branches.

Because you named this upstream , these remote-tracking names will have the form upstream/main , upstream/develop , and so on. The upstream/ in front of each one—or remotes/upstream in longer form, or refs/remotes/upstream to use the real full name—keeps these remote-tracking names from interfering with any of your own branch names. 4


4 Note that if you have a branch named upstream or upstream/main of your own, its full name is refs/heads/upstream or refs/heads/upstream/main . This is different from refs/remotes/upstream/main , so the two don't collide. Git will keep them straight. But it's confusing to humans: if you're going to have a remote named upstream , don't name any of your branches upstream/whatever .


You now have the superset that you need

At this point, in your own laptop repository, you have the combination you want/need to get everything done:

  • You have one branch name as a result of your git clone having run git checkout . That one name is probably main or master . The actual name used here is the one that the GitHub Git recommended to your Git, when you ran git clone url . You could pick a different name at git clone time, with git clone -b branch url , for instance, but you didn't, so you got main or master .

  • You have one remote-tracking name of the form origin/* for each branch name in your fork.

  • You have one remote-tracking name of the form upstream/* for each branch name in their fork.

You can now create any new branch names you like, in your local repository, pointing to any existing commit. To do so, just run:

git branch <name> <commit-hash-ID>

or:

git branch <name> <existing-name>

This tells your Git to create the new name as a branch name, pointing to the commit whose hash ID you gave, or whose hash ID was found by turning existing-name into a hash ID.

(To see the process of turning existing names into hash IDs, use git rev-parse : git rev-parse name does exactly that, reading the names database and figuring out the hash ID. Note that you can write main , or heads/main , or even refs/heads/main , to refer to your own branch named main . The exact rules for all of this are listed out in the gitrevisions documentation .)

To forcibly move some existing branch name, in your own repository on the laptop, you have two main options:

  • While not "on" that branch, use git branch -f with the name and a hash-ID (or another second name) to force-move the name.
  • Or, if you are "on" that branch, use git reset --hard with a hash ID (or name) to force-move the branch you are currently on, to that hash ID (or the one the name resolves to).

You are "on" some branch after using git checkout or git switch to switch to that branch name. That name becomes your current branch , and git status will say on branch <whatever> . The commit hash ID stored in that branch name is your current commit. The git reset command moves you to some other commit, yanking the branch name along with it; the --hard part tells this git reset to re-set both Git's index and your working tree (neither of which is described in this answer).

Once you have your (local) branch names set up the way you like, use git push with your remote-name origin . This has your Git call up the Git over at GitHub connected to your fork on GitHub, just like the git fetch upstream did (but to your fork, not their fork, since you're using git push origin and not git push upstream ). The push conversation, however, is quite different from the fetch conversation.

A git fetch is always safe. Your Git calls up some other Git. Your Git asks that other Git: What branch and tag and other names do you have? What commits do you have that I don't? Your Git then downloads any "missing" commits and supporting objects, and creates or updates your remote-tracking names. None of your branches get changed. None of your work is affected. You just add new objects to your Git's database-of-all-objects, and update remote-tracking names.

But git push is different. The beginning part is the same: your Git calls up some other Git. Your Git lists out (some or all of) your commits (by hash ID—your branch names don't matter yet , not right at this point), and they'll check to see if they need any new commits and other supporting objects. If they do need them, your Git will package them up and ship them over. But right at this point, the rest is very different. Your Git now asks ( git push ) or commands ( git push --force ) their Git to set some of their branch names , based on what branch names you used here.

As a now-power-Git-user, you can use a fancier form of git push :

git push origin <hash-ID-or-name>:refs/heads/<branch-name>

eg:

git push origin HEAD:refs/heads/new-branch

Here, by using the colon character, you get to put anything you like on the left side. Your Git will do a git rev-parse on this to figure out which commit(s) to send. Then, on the right side, you can list out a branch name. Sometimes you will need to spell it out in full like this. Sometimes you can just write new-branch . Your Git will ask (this kind of regular push) or command (force-push) that their Git should create or update their branch new-branch using this commit hash ID.

Usually, though, you will probably just do:

git push origin main

Since there's no colon in main , the left and right sides are just main and main , as if you typed git push origin main:main . Your Git and their Git will figure out that you want to send your main commits to them, if they don't have them, and then have them update their name main .

When do you need force-push?

Whenever you have your Git call up some other Git and give them commits and then ask them to set one of their branch names, they will do some checking. The built in checking, that all Gits do, is pretty simple. (GitHub adds a lot of features for things like "protected branches" that can do more checking. For this, consult the GitHub documentation. We'll ignore this here.) The basic check is just: does this add new commits , or does this take some commits away ?

Let's draw a branch again. Suppose they have a branch that goes like this:

...--G--H--I   <-- main

Meanwhile, you have, in your repository:

...--G--H--I--J   <-- main

You run git push origin main . Your Git and their Git confer: you have a commit J that they don't have. Your Git hands it over. J connects back to I . They inspect this and see that J connects back to I . You then have your Git ask their Git: Please, if it's OK, set your main to point to J .

Because J —or even JKL , if you had that—just adds on , it is OK. They say so and your git push finishes and they now have ...-GHIJ in their repository with their main selecting commit J . Your Git now updates your origin/main —your memory of origin's main —and your main and your origin/main all select commit J now.

But what if you have:

...--G--H--J   <-- main

because you never brought I in? Then your git push calls up their Git, sends over your J (it's still new to them), and politely asks them to set their main to point to J . They check: J links back to H , not I . If they set their main to point to J , they will drop I from their branch . That is not OK and they will reject your polite request, saying that it is not a fast-forward .

This is when you must use git push --force , if you really need them to drop a commit from this branch.

Putting it all together

You will want:

git clone <url1>
cd <new-clone>
git remote add upstream <url2>
git fetch upstream
git branch new-branch main       # assuming main, not master
git reset --hard upstream/main
git push origin new-branch       # remember the last commit from your old `main`
git push -f origin main          # drop commits

Before you run each of these, make sure that you understand what they do, and that I didn't goof any of this up. I don't have these two forks and hence can't test this exact set of commands.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM