简体   繁体   中英

New to GitHub: Your branch and 'origin/master' have diverged, and have 1 and 2 different commits each, respectively. What should I do?

I'm pretty new to GitHub (I stick to add, commit, and push and haven't played around with new branches) and today was trying to push some changes. However, I committed some files and realized I messed something up and tried to uncommit by running:

git reset --mixed HEAD~;

I tried pushing and resetting a few more times. I'm not exactly sure what I did, but I ended up here when checking git status:

Your branch and 'origin/master' have diverged,
and have 1 and 2 different commits each, respectively.

When I try to push, it states:

hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

So I think I'm now behind quite a bit because there are some files that have been tracked for the last two commits or so that git status is now saying are untracked. Furthermore, I don't want to lose any of my progress that I've made locally on my computer. How can I fast forward and push the changes I'd like to make, ideally without losing any past commits or current progress?

So I think I'm now behind quite a bit because there are some files that have been tracked for the last two commits or so that git status is now saying are untracked. Furthermore, I don't want to lose any of my progress that I've made locally on my computer. How can I fast forward and push the changes I'd like to make, ideally without losing any past commits or current progress?

Let's separate these into their various components, so that you can properly understand what's going on. The components we care about here are:

  • Repositories: you have one on your computer, and GitHub has one on theirs.
  • Inside each repository:
    • commits
    • branch names

The commits literally get shared, while the branch names do not—they're handled in a fancier way. But the commits only get shared at specific connection points , when you have your Git call up the Git over at GitHub; your Git and their Git then have a little conversation about these branch names and the commits. Let's leave that for later, and concentrate on what's in your Git repository first: the commits, and the names.

Names

Usually I start with the commits first, but this time, let's start with just the names. There are more kinds of names than just branch names and we'll come back to that later, but for now, let's just worry about what a branch name is and does.

A branch name is just a name—a series of letters, preferably, and/or digits, with some rules that prevent you from using things like br..an..ch as a branch name but allow bra.nch . This branch name's main function is to hold one commit hash ID. This is the hash ID of the latest commit. So without the commits , names don't actually do you any good at all.

Commits

Commits are the real central feature of Git. Almost everything is about the commits. Commits save versions of your files, forever—or for at least as long as those commits continue to exist—but it's important to understand how a commit does this, and how Git finds a commit.

Let's start this part with a simple idea: every commit is numbered. The numbers, however, aren't simple counting numbers. They're not commits 1, 2, and 3. Instead, each number is a big ugly hash ID . It looks totally random (though in fact it's entirely non-random). There's no obvious connection from the latest commit, whose number is, say, 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2 , to its previous commit.

To find a commit, you need to know its number. But the numbers look random, and are too big and ugly for humans to remember. That's why we have the branch names: they remember the number of the last commit. But wait: what good is it to know only the last commit? Well, let's take a look at what's inside each commit.

Every commit has two parts: it has its data, which is a full snapshot of all of your files. We'll come back to this in a moment. Then it has some metadata , or information about the commit itself. In this metadata you will find the name of the person who made the commit, when they made the commit, and why they made the commit: their log message. But Git will also store, and find, one more piece of information that Git itself wants, and that is the number—the hash ID—of this commit's parent commit.

The parent of any commit is the commit that comes before that commit. So for ordinary single-parent commits—which tends to be most of them—this means that Git can start from the last commit and simply work backwards. That's exactly what Git does, and we can draw it like this:

A <-B <-C   <--master

Here we have a simple repository with just three commits in it, all on one branch named master . The name master holds the hash ID of the last commit—which we're calling C even though it has some big ugly hash ID—and that commit C holds the hash ID of earlier commit B . So Git can use the name to find C , and then use C to find B .

Meanwhile B holds the hash ID of earlier commit A , so having found B , Git can find A . A is the very first commit anyone ever made, so it just has no parent. That lets Git stop working backwards.

Commits and branches

There's one more interesting wrinkle here, which occurs once we have more than one branch. Let's say we're up to eight commits in our repository:

...--G--H   <-- master

I've stopped bothering drawing in the backwards arrows between commits. That's OK because all commits necessarily point backwards, and there's one other key thing about commits: once you make a commit, nothing in it can ever change. 1 So the backwards-pointing arrow is frozen and there is no way to add a forwards-pointing arrow. That's not the case for the branch names , though: remember, master used to contain the real hash ID of commit C ; now it contains the actual hash ID of commit H .

If we make a new branch name now, the new name will also point to commit H . 2 Let's draw that:

...--G--H   <-- develop, master

Which name are we actually using? Git has an answer for us: we should attach the special name HEAD , written in all uppercase like this, 3 to the branch we want to be using. So:

...--G--H   <-- develop, master (HEAD)

means we are using the name master , which selects commit H , while:

...--G--H   <-- develop (HEAD), master

means we are using the name develop , which selects commit H .


1 The reason it's frozen like this is that the commit's number—its hash ID—is built by computing a secure hash of all of the bits in the data and metadata. This means that it's literally impossible to change a commit. If you take one out, turn it into ordinary data, change a bit, and write it back, you get a new and different commit . The original commit remains in the repository; you've merely added a new, different commit.

2 Actually, we can pick any existing commit to make the name. We must pick some existing commit, though: you are not allowed to have a branch name that doesn't point to some existing commit.

3 On some systems, you can type head in lowercase and have it work. This is a bad habit to get into because:

  • it does not work on all systems, and
  • it breaks when you start using git worktree .

If you don't like typing out the word HEAD , consider using the one-character synonym @ .


Making new commits, part 1

One place where HEAD matters a lot is when we go to make a new commit. Suppose we have:

...--G--H   <-- develop (HEAD), master

Either way we're using commit H . But if we change some files and git add them and git commit , this tells Git to make a new commit. Let's call the new commit I and draw it in, like this:

...--G--H
         \
          I

Note that the parent of new commit I is existing commit H . That's because we started from commit H .

What happens to the branch names? The answer is: Git automatically updates the name to which HEAD is attached . Since HEAD was and still is attached to develop , that's the name that now points to new commit I :

...--G--H   <-- master
         \
          I   <-- develop (HEAD)

If we now git checkout master to go back to master , Git will return us to existing commit H , attach the special name HEAD to master , and give us:

...--G--H   <-- master (HEAD)
         \
          I   <-- develop

Commits are read-only, so how do files work?

We mentioned earlier that commits are frozen-in-time snapshots of all of your files. To make this work well, Git stores each file in a special, read-only, Git-only, frozen and de-duplicated format. Only Git itself can use these files. So when we pick some commit to use, Git copies the files out of the commit into a work area. That work area, with ordinary everyday files in it, is your work-tree or working tree .

The de-duplication means that even though each commit has a full snapshot of every file, most of those snapshots are simply re-using existing files. That is, when we made commit I , we probably changed one or two files, and left all the others the same. So commit I and commit H actually share most of their files. They probably share most of those with earlier commits, too, up to some point. (In fact, if you change a file back to the way it was in some earlier commit, the new file is automatically shared with the older commit.)

That's the data inside each commit: a full, frozen snapshot of all of your files—or rather, of all the files you told Git to put into that snapshot. So which files are those?

Making new commits, part 2

Each commit holds these frozen-format files, which need to be expanded out into your work-tree. You might assume, then, that git commit takes whatever is in your work-tree and commits it. But that's not in fact how Git works.

Besides the original, in-commit, frozen files, and the in-work-tree, everyday files, Git keeps an intermediate copy of each file. 4 This extra copy is in an area that is either so important, or so badly named originally, that it has three names. Git calls it the index , or the staging area , or sometimes—rarely these days—the cache . I'll use the term index here but remember that staging refers to these extra copies.

The index copy of each file is in the frozen format , ready to go into the next commit. So this means that a good way to think of Git's index is that it contains the proposed next commit . The git add command tells Git: copy the work-tree, ordinary format copy of the file into, or back into, the index, replacing any previous copy. This also prepares it in the frozen format (de-duplicating it too) so that it's ready for the next git commit .

When you do run git commit , Git gathers any extra information it needs for the metadata—such as your name and log message—and then builds a new commit. Then it writes out whatever is in the index right then as the snapshot, adds the metadata—including the current commit as the new commit's parent—and makes the new commit, and then updates whichever branch name HEAD is attached to.

If you like to use git commit -a , be aware that this just makes git commit update files that are already there in the index . It's very nearly equivalent to running git add -u (update known files) followed by git commit . 5

You can't see the index directly, 6 but git status tells you, implicitly, what's in the index. The way git status works is also pretty simple:

  • You have a current commit . That's the one that your branch name (found by looking at HEAD ) says is the last commit. That commit has a bunch of files in it, in Git's special, internal, frozen format. Git also calls this the HEAD commit.

  • Git has its index. That has a bunch of files in it, in Git's special frozen format—but unlike the ones in the commit, they can be replaced with new copies.

  • And, you have your work-tree, where you can do anything you want—including create all-new files.

The git status command does two separate comparisons:

  • First, it compares all the files in the HEAD commit to all the files in the index. For every file that is the same , it says nothing at all. For every file that is different —including new or gone—it says that this file is staged for commit .

  • Then git status compares all the files in Git's index to the files in your work-tree. For every file that is the same (after expanding out of frozen form), it says nothing at all. For every file that is different , it says that this file is not staged for commit .

This means you can view what can be updated in the index, without having to view every file in the index that's the same as the copies in the


4 Technically, what's in the index is not a copy of the file, but rather a reference to a frozen-format Git internal blob object . But you don't normally need to know this—it only matters if you start using git ls-files --stage and git update-index to work with Git's low-level index.

5 The main difference here is that if making the new commit fails , the git commit -a method rolls the index back. Using git add -u is a separate step, so if the add works and the commit fails, the index is still updated. There are a bunch of more-subtle distinctions too, but we'll ignore all the tricky corner cases here. The important point is that Git makes commits from an index, and usually there's only one index— the index—and everything else flows from that.

6 Actually, you can see what's in the index: run git ls-files --stage . Note that this dumps a lot of output in a big repository: This command is not one you'd normally use, it's meant for Git programs to use internally. not for users.


Tracked and untracked files

Now that you know that Git makes its commits from its index, you can finally understand tracked and untracked files correctly. The definition of a tracked file is wonderfully simple, yet still complicated: A tracked file is one that is in Git's index right now.

You can add files to the index at any time: git add newfile . That file is now tracked. You can remove files from the index any time, too: git rm --cached oldfile . The --cached prevents git rm from removing your work-tree copy, so that you can still see the file, but it's no longer in Git's index: that file is now untracked.

But remember: git checkout branch tells Git to fill its index and your work-tree from some existing commit. So Git will update its index on its own, If there are files right now in Git's index and your work-tree, and you git checkout a commit that doesn't have those files, Git will remove those files from its index and your work-tree, so that you'll see what is saved in that commit.

An untracked file is any file that is in your work-tree, but not in Git's index. When you have such a file—one not in Git's index—and git checkout some other commit that also doesn't have that file, that file continues not to be in Git's index, and hence continues to be untracked.

(A sneaky case here occurs when you have an untracked file, then ask to switch to some commit that does have that file. We won't worry about that here, but you can probably see how this can be a problem.)

Getting rid of commits

Commits are actually very hard to get rid of (short of removing the entire .git directory, which loses everything and is rarely a good idea). That's because Git is built to add new commits , not to drop them. But you can in fact get rid of commits.

Suppose some branch name and some series of commits:

...--G--H   <-- master (HEAD)

Now suppose further that we could convince Git that the name master should hold, not the hash ID of commit H , but that of commit G instead, like this:

       H
      /
...--G   <-- master (HEAD)

Note that commit H is actually still there, in the repository. But Git shows us commits by starting from the commit whose ID is stored in a name , like master . The name now says commit G is the latest commit . Commit G points back to some earlier commit ( F , probably). So if we ask Git about the commits in this repository, we won't see commit H any more.

(We can, when we draw these, push the "discarded" commits up or down to get them out of the way. I'm a bit limited by StackOverflow text conventions, but if you draw these on paper or a whiteboard, feel free to draw them any which way, including long swoopy arrows from branch names to commits.)

Note that this only works for "tail" commits. That is, suppose we have:

...--G--H   <-- master (HEAD)
         \
          I   <-- develop

If we force the name master to point to commit G , well, commit I still points to commit H , so what we get is this:

...--G   <-- master (HEAD)
      \
       H--I   <-- develop

That is, now it looks like we made commit H on branch develop . Still, branch master now ends at commit G , so we've definitely done something .

This is what git reset does. When you ran:

git reset --mixed HEAD~

you told your Git: Find the commit that's one step back from HEAD —one back from the last commit on the current branch. Then, force the current branch name to identify that commit. If you had:

...--G--H--I   <-- master

and you do this once, you end up with:

...--G--H   <-- master
         \
          I

If you do it again, the name master points to commit G , and HI are left dangling. In your repository, they'll stick around for a while, by default—user repositories get at least 30 days to get these commits back. (The mechanism here is something Git calls reflogs , but we won't go into details.)

The --mixed argument to git reset tells Git to leave the work-tree untouched while moving things around. So the copies of files that are in the work-tree are left alone. With --hard , git reset adjusts those too. With --soft , git reset leaves Git's index alone, but with --mixed , Git empties out the old index and fills it in from the commit you select.

This—replacing the index, but leaving the work-tree alone—can easily lead to the untracked-files case. In particular, suppose commit I added a new file that is not in commit H . Then the reset above removes the new file from Git's index, leaving the new file in your work-tree. This file is now in your work-tree, but not in Git's index, and that is the very definition of an untracked file.

Keep in mind that all the committed files are safe in those commits, as long as you can still find those commits. By making a commit like commit I hard to find, you've set things up so that you might not be able to get those versions of your files back easily. But any commits that git log shows you, well, those commits are easy to find. (We've skipped over the idea of using Git's detached HEAD mode to look at a historic commit, so as to not have to cover that mode, but that's one way to see historic versions.)

Add more Git repositories

Now that you know how commits, and branch names, in your repository work—including adding new commits, and resetting some commits away—it's time to add the GitHub Git into the mix.

For your Git to call up some other Git, you need to have a URL—something like ssh://git@github.com/... or https://github.com/... . Your Git will save this URL for you, under a nice short memorable name. Git calls this a remote . Many Git repositories have only one remote, called origin , and I will assume that's the case for yours too.

To have your Git connect with this other Git, you will run one of three commands: git fetch , git push , and git pull . The git pull command is just a convenience wrapper that runs git fetch first, then a second Git command, and it's best—well, I think it's best—to learn git fetch separately. So that gives us just two commands that make Gits talk to each other.

The two commands on their own are relatively simple:

  • git fetch has your Git call up their Git, and then ask them what they have, that you don't. They list out their branch names (and other names) and their commit hash IDs. Your Git can immediately tell whether you have these commits, because the hash IDs are the same in every Git repository (see footnote 1 again). If you don't have the commits, your Git asks their Git to send them over. They do, and now you have the commits too.

    Now that you have all the commits that they have (plus any of your own that you have not shared), your Git creates or updates your origin/* names, to remember what they have as their branch names. Each of your origin/* names is a remote-tracking name . 7 These are just your Git's memory of what hash IDs they had in which branch names, at the time you ran git fetch .

    If they don't change their branch names (ever, or often), your git fetch will set your remote-tracking names correctly every time. If they do change them fairly often, you need to run git fetch fairly often to pick up any new names and different commit hash IDs.

    You can run git fetch just like this, with no arguments at all.

  • git push has your Git call up their Git and then give them new commits if needed. This is a little bit more complicated than git fetch , because for them to remember any new commits, they will have to update their branch names. They do not have the equivalent of remote-tracking names for you.

    As with git fetch , your Git lists out commit hash IDs. They check to see if they have those IDs. If not, they have your Git send over those commits (and their files if necessary—there's a lot of fancy stuff in here to avoid sending files they already have copies of). As before, the fact that every Git uses the same hash IDs for identical commits makes this easy.

    Then, once their Git has any commits required, your Git sends over one or more requests: Please, if it's OK, set your branch name ______ (fill in a name) to ______ (fill in a hash ID) . It's up to them whether to obey this polite request. Or, your Git can can send over a command: Set your name _____ to _____! It's still up to them whether to obey.

    The git push command requires 8 that you put in the name origin , because long ago, someone said that the syntax was going to be git push remote branch so you have to put the origin in there before you put in the branch name. Then, git push needs the name of the branch. This tells your Git which commit(s) to send—your Git finds the last commit from your branch name as usual—and fills in the two blanks. That is, we have to put both the branch name and the hash ID into the two blanks, for the polite request or forceful command. Your Git gets both of these from the name you put in here. 9

    They tell our Git whether they obeyed the command. If so, our Git updates our one remote-tracking name corresponding to their branch name. That is, if we got them to update their master , our Git updates our memory stored in origin/master . Since we didn't find out any of their other branch names, none of our other origin/* names get updated.


7 Git calls these things remote-tracking branch names , but I find that the word branch here makes things more confusing, not less; so now I leave it out and just call them remote-tracking names .

8 You can set things up so that git push defaults to push the current branch , and then you can leave this out—Git will figure out both the right remote and the current branch—but I like to show the explicit version.

9 There are additional options available so that you can get fancier. For instance, you can push from your name grandpa-simpson to their name onion-on-my-belt if you like. It's possible to use totally different names on each side. But don't do that without a strong need: it rapidly becomes very confusing.


Fast-forwards and non-fast-forwards

Let's imagine, now, that we're a Git that is receiving a git push . Some other Git has called us up and asked us if we have commit a123456 . We don't, so they gave it to us. a123456 has parent 9876543 , which we do have, so that's the only commit we need. Now they say: Please, if it's OK, set your master to a123456 .

Let's draw what we have:

...--G--H   <-- master

Suppose the hash ID of commit H is 9876543 . Then the new commit a123456 is apparently a new commit I that just adds on to our existing master , and we can put it in like this:

...--G--H--I   <-- master

But what if the parent, 9876543 , is not commit H ? What if it's commit G ? That is, we have:

...--G--H   <-- master

and they've given us:

...--G--H
      \
       I

and they are now asking us to set our master to remember commit I ? If we do that, we'll lose our commit H . We will end up with:

       H
      /
...--G--I   <-- master

and we won't be able to find commit H any more. So we'll say no to a polite request, because this operation doesn't just add commits to our master , it also drops a commit.

If they send us a forceful command— set your master to a123456 ! —we'll probably obey it, and drop commit H . If they didn't keep a copy of commit H , it could go away pretty fast. Server-side repositories often have no reflogs, and an abandoned / dangling commit can be removed almost immediately.

Your own situation

We can draw a picture of your situation, in which your master is ahead 1, behind 2 of your origin/master —your Git's memory of their Git's master . It might look like this:

          I--J   <-- origin/master
         /
...--G--H
         \
          K   <-- master (HEAD)

You can, if you like, use git push --force origin master to send them your commit K and tell them to abandon their IJ commits. But if you want to keep those commits, don't do that.

You can, if you like, abandon your own commit K :

git reset [options] origin/master

will give you:

          I--J   <-- master (HEAD), origin/master
         /
...--G--H
         \
          K   [abandoned]

Your commit K will stick around for a while, though finding it will be a bit hard. If you don't want it after all, that's probably fine.

You can use git merge to combine the changes from commit H as compared to commit K , and the changes from commit H as compared to commit J , to make a new commit. You can use git rebase to copy existing commit K to a new and different commit that adds on to commit J . There are a lot of things you can do. Each one has a different set of resulting commits. Remember what a commit is and does for you, and how branch names find commits, and decide which commits you want, which ones you would like to pretend never happened—to remove with git reset and/or git push --force —and set things up in your own local repository the way you want. Then use git push , with or without --force , to send new commits to the GitHub Git, and get them to set their branch names to point to the new commits, to match your own setup in your Git.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM