Trying to remove authentication from git pull creating strange merge conflicts and changes that shouldn't exist

Question

I have a script that I wrote to help automate large pull requests to master in git. I'm trying to get rid of the user need to log in when doing things like pull to update the branch, so I've been trying to figure out how to do this. So, I ended up creating a personal token in Bitbucket Server to see if I could get it to work for myself, and it does work. A personal token wouldn't work for everyone, but I was hoping to work out the right syntax to test it.

The command I came up with is this:

subprocess.check_call(['git', 'pull']+[f'https://{username}:{MYTOKEN}@{repo_url}'], cwd=repo_path)

But I get really weird behavior from it where it pulls in a bunch of files from the script I made, and then a bunch that I didn't touch. In both cases, I never pushed anything to the remote branch, or committed the local branch. There's nothing in the staged area either.

So, I tried this to see what would happen, and I get correct behavior where it says my repo is up-to-date and there's nothing to pull. This matches the manual git pull that I had been doing since I didn't actually touch any files. But it requires the user to enter credentials which is what I was trying to get rid of.

subprocess.check_call('git pull',cwd=repo_path)

Any idea what would cause something like this?

Answer 1

First, let me say that you generally should not use git pull here, because you're writing a script that should not be interactive, and git pull tries to be interactive when it runs its second command. (There are some ways to work around this, especially in modern versions of Git, but it will help to break things up into the two separate steps here.)

With that out of the way, regardless of whether you use git pull to run git fetch , or run git fetch yourself, there is one key difference between:

git <command>

and

git <command> https://username:password@bitbucket.server/path/to/repo

and that is that the second command provides a URL. This key difference actually matters twice, as we'll see.

The first command lacks any URL-or-remote argument, so it looks up the remote based on the current branch , or uses origin as the remote if there is no remote for the current branch (or is no current branch, in the case of being in detached HEAD state). This is true regardless of whether the command here is fetch or pull because git pull invokes git fetch with the argument you provided. So either way, we run git fetch : one way with no extra arguments, and the other way, with a URL.

When git fetch is given a remote , this enables some nice features. In particular, git fetch will update the corresponding remote-tracking names . This isn't the immediate source of the problem, but updating remote-tracking names is a good thing. Providing a URL prevents git fetch from updating the remote-tracking names. This isn't crucial, but it's a problem that you can't easily solve. It's just something to keep in mind: it will be a minor, but constant, annoyance later.

More importantly, though, this affects the second command that git pull runs. When git fetch finishes, it writes a file named FETCH_HEAD in the Git repository directory (typically .git ). When running git fetch with a remote , we end up with contents like this:

$ cat .git/FETCH_HEAD
1c52ecf4ba0f4f7af72775695fee653f50737c71        branch 'master' of <url>
898f80736c75878acc02dc55672317fcc0e0a5a6    not-for-merge   branch 'maint' of <url>
bcca9488540da62a407e744ef77a8abcf8e92efe    not-for-merge   branch 'next' of <url>
1c4d5706c6ff6a04567b24d4b3168b09793a83f9    not-for-merge   branch 'seen' of <url>
32af5571f1841d138c786b68d4ec8c6a07752540    not-for-merge   branch 'todo' of <url>
a8eaf9de52c2d49799d7dc724e688ccbfa74390c    not-for-merge   tag 'v2.30.0-rc0' of <url>

When we run the same command with a URL—even the same URL that git fetch origin would use—we get instead:

1c52ecf4ba0f4f7af72775695fee653f50737c71        <url>

Note that all the various branch names, and the not-for-merge lines on the branch names that should be ignored for the next step, are missing. The only hash ID in the FETCH_HEAD file is the one corresponding to the HEAD in the other Git repository at the given url .

This is probably where things go wrong

The second command that git pull runs is:

git <command> <options> <hash>

The command part here is normally one of git merge or git rebase (there's one very special case that won't apply here where it is neither of these). The options depend on the command since git merge gets a -m option while git rebase does not (but can get other options). The hash is the real problem here.

The hash ID that git pull supplies to either git merge or git rebase comes out of that .git/FETCH_HEAD file. When using a remote, one particular line of that file will correspond to the upstream of the current branch , and that's the hash ID that Git will use. But when git fetch was given a URL instead of a branch name, the fetch command wrote only one hash ID: that of the other Git repository's HEAD . If this isn't the right hash ID, your second command will use the wrong hash ID.

This is almost certainly what is happening.

How to fix this

You can fix it by:

supplying the right name to git pull so that it can pass it on to git fetch , and/or
running git fetch yourself, then running a second Git command yourself if/as needed.

Given that git pull is designed to be interactive and hence changes the way it behaves based on the user's preference configuration settings, I'd suggest doing both: run git fetch yourself, then figure out what second command you'd like to run.

The fetch may still need the access token. (Note, by the way, that passing this on the command line makes the access token readable to other processes on the machine, so it's not very secure. It's perhaps a bit more secure to set up a remote that holds the token, in a file that's secured, then use a remote name. The remote name will control the remote-tracking names, which continue be that ongoing annoyance I mentioned earlier. There are further workarounds for this but once you have some remote-tracking names, the annoyance level is really fairly low, and this might be good enough.) But now, regardless of how you do it, it can get the name of any branch or tag you want resolved in the other Git repository, so as to find the right commit.

The second command can still be git merge , perhaps with --ff-only , or rebase, with a check that it succeeds and a rollback on failure—or perhaps even a git checkout to use a detached HEAD rather than an attempt to change any existing branch names in the local repository. The important thing, though, is that by knowing that git pull really means fetch, then run a second command , and knowing what the various options are here, you can take control of all the parts of the two operations. You won't be at the whims of git pull using the wrong options and/or wrong branch names.

Trying to remove authentication from git pull creating strange merge conflicts and changes that shouldn't exist

Question

1 answers

solution1
1 2020-12-17 07:00:10

This is probably where things go wrong

How to fix this

Trying to remove authentication from git pull creating strange merge conflicts and changes that shouldn't exist

Question

1 answers

solution1 1 2020-12-17 07:00:10

This is probably where things go wrong

How to fix this

solution1
1 2020-12-17 07:00:10