简体   繁体   中英

Git fetch and git pull relationship

Looking up the difference between git pull and git fetch , many sources say that git pull is a superset of fetch, ie git pull is fetch + merge.

However, I seem to remember many times where git pull told me that everything was up to date, but fetch yielded new information.

Can someone explain this discrepancy between theory and reality?

Pull is indeed fetch plus merge.

Except when it's not.

When isn't it? When it's fetch plus rebase, or—very rarely—fetch plus checkout. But in all three cases, it's still:

  1. git fetch , followed by
  2. some second Git command to do something with the fetched commits.

Where this gets complicated is not so much in the second command—though that second command does complicate things—but rather in the arguments passed from git pull . Since git pull is running two other Git commands, and Git commands' actions depend on their options and arguments, it matters what options and arguments git pull passes to git fetch and to that second command, whatever it may be.

Aside: a look into history

In the early days of Git, there were no "remotes" like origin , which meant there were no "remote-tracking names" either. You would run:

git fetch git://name-of-linus-torvalds-machine/repos/foo.git

to get stuff from Linus and then run git merge FETCH_HEAD , or something along these lines. This was error prone (easy to have a typo in the URL) and annoying, so Git acquired a bunch of temporary methods to deal with this.

Note that with no remotes, all git fetch could do was leave a bunch of information in .git/FETCH_HEAD so that you could figure out which branches in Linus's repos had been updated and so on. And of course, git pull wrapped these two commands into one, so that you didn't have to run two separate commands, and most people used git pull . But something was clearly missing. So remotes were invented:

  • We now had a short simple name like origin that we could use instead of a URL. (This got rid of the need for all the weird hacks for naming remotes that are still listed in the documentation , but they're all still in there. Look for Named file in $GIT_DIR .)
  • We now had a way for Git to save the hash IDs associated with Linus's latest versions, so that we didn't need to create lots of branches locally. The remote-tracking names ( origin/master and the like) take over a job that would in the past require using a local branch name.

But all these things are still supported and some of them are still described as "the way to do things" in some (ancient) documents, so you can still use the old crude methods. Perhaps some do.

In any case, remote-tracking names now exist. However, between Git 1.7 and Git 2.0, there were some updates to them. Specifically, Git 1.8.4 fixed something eventually declared to be a bug. Some people are still using Git 1.7.x for some strange reason, so be aware that you could hit them.

In Git 2.11, the old git pull shell script was formally retired. While git pull still effectively runs git fetch followed by a second Git command, you can no longer point to the shell script and say: "See, here at this line, it runs git fetch . Then it has these tests and then it eventually runs this other command..." The result is that it runs much faster on Windows, and is much harder to explain. 😀 It's also gained a feature or two since then, enough that at least a few hardcore "anti pull" people like me are now willing to actually use the thing. But that's another story.

How you run git pull

The git pull command has a lot of options. See its documentation for the complete list, then compare these options to those for git fetch and for git rebase and git merge . Note that the pull documentation says that some options are passed to one or the other or to both, and that there's a fair bit of overlap in some options (eg, all take -q for quiet and -v for verbose ).

With or without these options, though, you can run:

git pull

or:

git pull origin

or:

git pull origin main

for example. If and when you do run any of these, all of these positional arguments are passed to git fetch .

Note that you can even run:

git pull origin main feature

but you almost certainly should not . We'll cover why this is later below.

Options, if you give them, are passed as described to one or both of the fetch and second-command steps.

The fetch command is always passed one extra option, namely --update-head-ok . Pull needs to pass this option, but also needs to be careful because careless use of this can get your current branch, index, and working tree out of sync. Do not use this option yourself unless you know exactly what you are doing.

For (at least, and maybe only) historical reasons, when passed some refspec arguments, such as main in the git fetch origin main case, git fetch will only update the specified refspecs and associated remote-tracking names . Since git pull passes all the refspec arguments you supplied on to git fetch , but no extras of its own, git fetch gets a refspec argument if and only if you passed refspec arguments to git pull here.

(Fetch refspecs are slightly different from push refspecs: git push origin main is equivalent to git push origin main:main , but git fetch origin main is equivalent to git fetch origin main:<discard> with the side effect of also updating origin/main . If you like, you can run git fetch origin main:main , but this requires that you not be on that branch, except for the --update-head-ok special case that git pull arranges.)

Adding in the second command

The second command that git pull runs is:

  1. git merge , by default, or
  2. git rebase , if you've told Git to do that, or
  3. git checkout , in the one special case.

Again, git pull passes options and arguments to the second command, and here things get messy. When git pull runs git merge , it passes:

  • merge options that the documentation describes as passed-through; plus
  • a -m option with a precomputed merge message (unless you supply your own -m ); plus
  • the commit hash ID of the commit that is the branch tip of the branch name(s) on the remote, as selected.

That last one is a puzzle: what does "as selected" really mean? Well, let's go back to the git pull syntax:

git pull
git pull origin
git pull origin main

We know that these words, if supplied ( origin and main ), are passed through to git fetch . They specify the remote and, if there's a second word, the branch name as seen on that remote for the git fetch operation.

If we don't supply a branch name as seen on the remote, git pull requires that the current branch —the one we're on , as in git status will say on branch main or whatever—have an upstream set. (See also Why do I need to do `--set-upstream` all the time? ) An upstream is technically a pair: both a remote and a branch-name-as-seen-on-the-remote. These are normally presented to you in the more palatable remote-tracking name format, so that the upstream of your main would typically be your origin/main , ie, main as seen over on origin .

Your git pull command will fish the branch name out of the upstream, if needed. It does not pass this on to git fetch , but it does use it later during this second git merge command. At this point git pull will use .git/FETCH_HEAD —which git fetch still writes, just like it did in primeval Git before Git 1.5 was released more widely—to fish out the commit hash ID associated with main over on origin . That's the hash ID that git pull passes to git merge .

In other words, if you're on your main and its upstream is origin/main and you run:

git pull

your Git will run:

git fetch --update-head-ok

followed by, if using git merge :

git merge -m "merge branch 'main' of <url>" <hash-ID>

where the URL and hash-ID are those from origin and from .git/FETCH_HEAD .

If you, yourself, run:

git fetch
git merge

you'll get the same effect , except that you won't have a -m option and the merge message will be the default, which will be merge branch 'origin/main' . That is, the URL vanishes and the branch main of ... part is phrased differently.

But if you run:

git pull origin main

your git pull command will run:

git fetch --update-head-ok origin main
git merge -m <same message as before> <same hash ID as before>

That is, the extra origin main get passed to git fetch , which limits what gets fetched .

We can also now see why we should not run:

git pull origin main feature

This would run:

git fetch --update-head-ok origin main feature

(which itself is fine), but then it will run:

git merge -m <message> <hash#1> <hash#2>

That is, your git pull will fish out, from .git/FETCH_HEAD , two hash IDs: one corresponding to main on origin , and one corresponding to feature on origin . It then passes both hash IDs to one single git merge command . This one git merge command will do what Git calls an octopus merge . 1

(Those new to Git often seem to expect that:

git pull origin br1 br2

should check out br1 locally, fetch-and-merge origin/br1 , then check out br2 locally, and fetch-and-merge origin/br2 , perhaps as a more efficient thing than this somewhat clumsy sequential description. That could make sense, and I believe I thought this myself at one point, but it's just not true.)

If you tell Git to use git rebase instead of git merge —which you can now do in several ways, such as setting pull.rebase to true , in addition to providing --rebase as an option to git pull —Git will replace the git merge command with a git rebase command. This changes the set of options that can be passed through:

  • rebase does not accept -m , so you cannot give one;
  • rebase does not accept --ff-only or --no-ff , so you cannot give these.

The git rebase command has a mode called autostash where, if your status is not "clean" (as in git status would not say working tree clean, nothing to commit ), git rebase will run git stash push before it starts the rebase, and git stash pop at the end. I am not a fan of git stash in general and unless you're pretty good at dealing with conflicts, I recommend not using this feature.

If autostash is disabled (which is the default), the rebase will refuse to start if the status is not "clean". With git merge as the second command, the merge will generally refuse to start in the same situation (although I recall ancient Git versions behaving differently, with the same messy side effects as for git stash pop in some conflict cases).

The last case is one that's only seen rarely. You can have a Git repository in a special state, for which Git uses two different terms: an unborn branch or an orphan branch . This state exists in part because a new, totally-empty repository has no commits at all on it.

A branch name , in Git, must contain the hash ID of some valid, existing commit. But when you run git init and it creates a new, totally- empty repository, there is no commit. With no commits, there can be no branches. And yet, git status will say that you're on some branch, and that there are no commits yet and you should make the first one.

In this state —this orphan / unborn branch state—the next commit you make will be a root commit , which in a new empty repository is what you normally want: that's the first commit ever, and it starts history existing. Now you have a commit and you can build on it.

When you run git pull while in this unborn-branch state, though, the git pull operation may get a bunch of commits from the remote (from origin for instance). The second command is supposed to combine those new commits that git pull got, as directed by the remaining git pull arguments, with the commits on the current branch. There are no commits on the current branch (which does not exist), but zero plus something is the something, right? So git pull declares that the result of this pull-into-empty-repository is that you should check out the commit that's at the tip of the branch you git pull -ed. That is:

git init
git remote add origin <url>
git pull origin main

should have your Git reach out to the given URL, find their main , get commits from their Git, create your origin/main , and then create your own main that is an exact match for your origin/main that your Git just created based on their main .

The thing that does this last step is a branch-creating git checkout -b or git switch -c , so that's what git pull will do here. (There was a bug, back in Git 1.5 or 1.6 or so, where if your working tree was non-empty, this git pull command would wipe it out entirely . This bug bit me at least once and is at least some of the reason I learned to avoid git pull . This bug has been long fixed, but I generally like to fetch, inspect , and merge-or-rebase, and I need to run git log to do the inspecting, between the fetch and the second—or rather, third—command. So I still use git pull only sparingly at best. But it now has pull.ff only as a configuration item, and that covers my most common case, so I am slowly warming up to it.)


1 For more on octopus merges, see the git merge documentation . Note that if the two hash IDs are identical, the effect of this octopus merge is largely the same as that of a regular merge, except that octopus merges cannot handle conflicts. At least, not yet: Junio Hamano was musing a bit on whether the new merge-ort might be able to tackle this.

It's not clear to me that this is a good idea. In fact, it's somewhat clear to me that having octopus merge be weaker , and not able to handle merge conflicts, is a good thing.


What about ...

However, I seem to remember many times where git pull told me that everything was up to date, but fetch yielded new information.

If you run git pull origin main and get the up-to-date message, your current branch has origin/main merged in and there's nothing to do here. But if you then run git fetch origin (or just git fetch ), you'll fetch all their branch names, updating all your remote-tracking names.

If the upstream of the current branch is origin/main , you can run:

git pull

instead of:

git pull origin main

and the git fetch that git pull runs won't be limited to fetching only their main .

However, I seem to remember many times where git pull told me that everything was up to date, but fetch yielded new information.

That's merely because of how Git reports what happened.

git fetch updates all remote tracking branches and reports on that, so any new commits at the remote site will yield some sort of output.

But git pull , though it also does a git fetch , only reports what happened on the current local branch, which might well be nothing even if the fetch did bring lots of commits into the remote tracking branches. (Another good reason not to use pull !)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM