简体   繁体   中英

Disassociating Files in Dev Branch from Master Branch Counterparts So They Remain Separate After Merging

Background

I have two (actually many branches) that have diverged and need consolidating in order to productionise a process.

The dev branch file names are in common with the master branch. Some of the files in dev branch with names in common are 'ready' for production and the others are 'not-ready' because they require re-work and/or are expected to create nasty conflicts.

For the 'not-ready' files with names in common, I'm keen to disassociate them from their master counterparts so they can remain separate after branch merging. I have tried renaming dev branch file names to something else eg git mv NewFile.txt DevFile.txt , however merging simply incorporates the rename in addition to normal file content merge behaviour.

Core Question

Is there some way to disassociate files in master and dev branches so the files remain separate after merging? Ideally whilst also preserving their histories?

Further details

I've tried to include a simplified worked example in public repo in github. All shell commands to add/mv/merge etc are included in script.sh. To clone/look, perhaps you can access example with git clone git@github.com:NedScandrett1/TestRepo.git

Is there some way to disassociate files in master and dev branches so the files remain separate after merging?

Not exactly. You may, however, be able to get "close enough". See below.

Ideally whilst also preserving their histories?

Files don't have history, in Git. In Git, commits are history, and if you run:

git log

you see all the commits—all the history—starting from the current commit and working backwards.

Git always works backwards like this, because commits themselves work backwards internally:

  • Each commit has a full snapshot of every file, plus some metadata: information about the commit itself.

  • The snapshot stores all the files that Git knew about at the time you, or whoever, made the snapshot. None of these can ever be changed: they're part of the commit, and the commit's number—its hash ID—depends on these file contents.

  • The metadata tells Git who made the commit, when, and why (their log message). In with this commit metadata, Git has included a list—usually of just one commit—holding the hash ID of the previous commit(s). This, too, can never be changed: this particular commit's hash ID, ie, its number in your Git's database of "all commits we have", is determined by these contents (the data and metadata).

The metadata are responsible for this backwards-connecting string. If we draw a set of simple single-parent commits, replacing their actual hash IDs with one-uppercase-letter names for our own sanity, we get something that looks like this:

... <-F <-G <-H

where H stands in for the hash ID—the commit number—of the last commit in the sequence. (Side note: Git finds this hash ID in the branch name , which by definition holds the hash ID of the last commit in the chain. Updating the branch is done by sticking a new hash ID into the branch name. That hash ID, whatever it is, becomes the last commit in the branch.)

We—humans, that is—like to view commits as a set of changes. To do so, Git has to use two snapshots. For instance, to see what changed in H , Git will extract the snapshots of both G and H , and compare them. For whatever is the same, Git says nothing; for whatever is different, Git tells us how to change the copy in G to match the copy in H . So that shows us what changed. But in fact, all we have is snapshots.

As I said in the parenthetic side note, a branch name simply holds the hash ID of the last commit in some chain. Being last in some chain doesn't mean it's the very last. For instance, if we have:

...--G--H--I   <-- last

we can add a new name pointing to existing commit H :

...--G--H   <-- interesting
         \
          I   <-- last

The branch named interesting is now the chain of commits that ends at H , while the branch named last is the chain of commits that ends at I . Commit H is on branch last , it's just not the last one of branch last .

With that in mind, we can look back at the first question:

I have tried renaming dev branch file names to something else eg git mv NewFile.txt DevFile.txt , however merging simply incorporates the rename in addition to normal file content merge behaviour. ... so that... files remain separate after merging...

When we use git merge , here's what we are really doing:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

We've run git checkout br1 to pick commit J as the current commit via branch name br1 as the current branch . That's what the (HEAD) attached to br1 tells us, in these drawings. Then we run git merge br2 .

Git now:

  1. Uses the graph —the connections from J backwards, and from L backwards—to find the merge base commit . This is the best shared commit, on both branches . There is usually just one best shared commit, and in this particular drawing, it's obvious: it is commit H . Commit G is also shared, but it's not as good as H because it's "further back".

  2. Runs a pair of git diff --find-renames commands (or internal equivalent thereof, really). What Git needs here is to figure out what has changed since the shared starting point—commit H in this case—to produce the snapshots in commits J and L respectively. That takes two separate git diff commands.

    If you have renamed some files, it's this --find-renames part that figures this out. All Git has is snapshots . Commit H has some set of files, commit J has some set of files, and commit L has some set of files. If commit H has NewFile.txt but no DevFile.txt , and commit L has DevFile.txt but no NewFile.txt , well, perhaps those are really "the same file", whatever that means.

    Figuring out this "sameness", if there is any, is the job of --find-renames option. When using git diff , you control the find-renames settings. When you use git merge , git merge controls them; but read on.

  3. To proceed with the merging, Git now tries to combine the two sets of changes found in step 2. There is a great deal of detail here that we'll just skip right over. :-) Assuming that Git is able to combine these, though, Git then applies the combined changes to the files found in H . This preserves our changes ( H -vs- J in br1 ) while adding their changes ( H -vs- L in br2 ).

  4. If Git is able to combine all of this on its own, it will go on to make a new merge commit , unless we tell it --no-commit or -n on the git merge line. If Git encounters merge conflicts, it stops and leaves us a big mess to clean up. This is when those details we skipped in step 3 become important. If it stops—per --no-commit or because of conflicts—it still records the right stuff to make a new merge commit when we finish the merge.

Once the merge is done, we end up with this:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

Commit M is a merge commit , and its special feature is that it links backwards to both commit J and commit L .

When Git is walking backwards through history, one commit at a time, it has a bit of a problem because it now has to walk backwards to two commits at the same time. Without getting into a lot of detail, one thing that Git does here is that it doesn't show what changed, because showing what changed only works right when we pick out one single before-commit to go with our one after-commit. With a merge commit, there are two (or more) "before" commits. (Merges that have more than two parents exist, and are called octopus merges , but we won't deal with them here.)

(There are some tricks you can use here, like -m to "split" a merge or --first-parent to avoid looking at anything but the first parent of a merge commit. We won't cover those here either; they are more useful for forensic work with bad merges later, rather than for achieving good merges now.)

Rename detection in general and in merges

This kind of "detect a rename" operation is important for:

  • git diff , if you turn it on;
  • git log --follow , which we'll describe in a moment; and
  • git merge , where you've encountered it already.

When git merge is running its two git diff operations, merge is the one controlling the rename detection. But there are command-line options to git merge to tell it how to control this:

  • -X find-renames= number tells Git what *rename threshold to use;
  • -X no-renames completely disables rename detection.

(In very old versions of Git, -X find-renames=... is spelled -X rename-threshold=... .)

The default value for rename detection, when it is enabled, is 50 . This value is a percentage, although exactly what it's a percentage of is a bit iffy. This means the possible values range from zero to 100. Zero never actually happens, however, and 100 is reserved for "the file contents match exactly", so useful values are normally 1 through 99. The default 50 means "50% similar".

Whenever Git is comparing two files—let's call them L and R, for Left and Right side files—it can compute a "similarity index" value. This is based on observations Git makes in trying to decide how many bytes from L are retained in R, and how many bytes in R are either all-new (weren't in L at all) or require deleting stuff from L and inserting new stuff, or whatever. The actual computation is obscure: it's not documented anywhere. But, if you run git diff --find-renames yourself, and give it two commit hash IDs, Git will tell you, whenever it finds a rename, what the similarity index was. So:

git diff --find-renames=01 --name-status L R

will compare commit L to commit R, and for each file that's new in R and gone missing from L, try to guess if that file was renamed. With the 01 value here, it will accept a match as low as "similarity index 1%" (1% similar) and take that as a rename. The --name-status option then makes git diff print out just the file names and status-es, and R —which means detected a rename —will be followed by the actual similarity.

Adding --diff-filter=R to our git diff makes it print only the R status files, so that we see what the actual similarity index values were, for each detected rename.

If Git is not detecting renames that you want it to detect, you can try your git merge with -X find-renames=25 or some smaller value. Note that 2 means 20%, not 2%; you must write 02 or 2% to mean 2%.

If Git is detecting renames that you don't want it to detect, you can try your git merge with -X find-renames=75 or something larger, or even -X no-renames .

git log --follow

Note that when using git log --follow , you must give Git exactly one pathname:

git log --follow path/to/file.ext

What Git does here is to start out with the usual backwards, one-at-a-time commit walk through some commit graph. Let's say we have this graph at this point:

          I--J
         /    \
...--G--H      M--N--O   <-- somebranch (HEAD)
         \    /
          K--L

The first comparison is commit N vs commit O . Git checks to see whether the file named path/to/file.ext was modified, by running git diff --find-renames NO internally (with some shortcuts turned on to not bother looking at other file names). If the copy of that file is identical in both N and O , Git moves back to N without showing commit O at all. If the file exists in both commits but is different, Git shows commit O and moves back to N . If the file doesn't exist in N , but the diff from N to O can find a rename , Git shows commit O and moves back to N , but this time, since the file was renamed from old/path/to/file.ext , Git starts looking for that name now.

So, if we've moved from O to N , we're now looking for whatever the name is in commit N . We now compare commit M —a merge commit: it still has a snapshot, just like any other commit—to commit N , to see if the file changed and/or changed-names. If it did change or change names (or both), git log --follow shows commit M ; if not, it doesn't show M .

Here, because M is a merge commit , things get tricky. If we haven't disabled what Git calls History Simplification , Git now looks at all parents of M to find any parent in which the file—whatever its name is at this point— matches . If it finds one of these, Git will go down that leg of the merge only . This is History Simplification in action.

You can prevent this with --full-history , but this interacts poorly with `--find-renames, as we'll see.

Let's say the name didn't change yet, so that we still have path/to/file.ext . This is our file's name in commit O , and N , and M . This is also the name in commit L , but in commit K the file is different and/or has a different name.

Since the file does match in L , our git log --follow will walk from M to L , and will proceed to look at commits L , then K , then H , then G , and so on.

If we add --full-history , our git log --follow will also walk from M to J . However, if it already went down the L -to- K leg, and the file got renamed in that L -to- K changeover, Git will be looking for the wrong file name in the J and I commits!

Conclusions

The things to take away here is that:

  • There is no such thing as file history. There is only commit history, because commits are history.
  • However, Git has some tools—such as git log --follow —that can attempt to produce a synthesized, reduced "file history" by selecting interesting parts of commit history.
  • This can work well, but it can also go wrong. Be aware of history simplification.
  • The --follow that Git does today is not very good. It depends on rename detection—which isn't awful, but isn't great to start with—and adds some very sloppy assumptions atop that, which makes it even less good. Improving it will be hard (I had a look at this about a decade ago, and neither I nor anyone else has fixed it since then).
  • Merge depends on this same rename-detection system.

You may be able to get useful results from fussing with the rename detection threshold, but note that you'll probably want to make some sort of change—even if it's just adding a big comment—to the file to get the rename similarity index down a bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM