How do I revert a specific commit that changed all files, causing lots of conflicts during revert?

Question

In a collaborative repository, someone made a commit that somehow made it appear like it re-added all files in the repository again.

Here is part of the history when printed with --date-order :

* 28cbf861 - (65 minutes ago) update logs and metadata (HEAD -> master, origin/master, origin/HEAD)
* e589776e - (18 hours ago) P2LTR15 update
...
* d4c61147 - (5 days ago) Delete P2STR03_SRC00150_Q0448_0000_0-7.log
* fa837509 - (5 days ago) Delete P2STR03_SRC00150_Q0427_0000_0-7.log
*   9a5e2300 - (5 days ago) git pull at 20180421 10am.
|\
* | 6567df4b - (5 days ago) add md5 3min
* | 6f7c80f7 - (5 days ago) Added md5sum files for all 3min SRCs
 /
* b97e834f - (6 days ago) Delete P2STR13_SRC00605_Q0325_0000_0-9.log
* 5cd9989b - (6 days ago) Delete P2STR13_SRC00605_Q0129_0000_0-9.log
* 769ae25d - (6 days ago) Delete P2STR13_SRC00605_Q0209_0000_0-9.log
...

The faulty commit is 6f7c80f7 , and I would like to remove it, but keep 6567df4b . The git pull commit ( 9a5e2300 ) is a simple merge commit.

When I print the history without --date-order , suddenly the commit 6f7c80f7 is shown at the very bottom, ie as if it was the first commit ever.

* 28cbf861 - (70 minutes ago) update logs and metadata
* e589776e - (18 hours ago) P2LTR15 update
...
* fa837509 - (5 days ago) Delete P2STR03_SRC00150_Q0427_0000_0-7.log
*   9a5e2300 - (5 days ago) git pull at 20180421 10am.
|\
| * b97e834f - (6 days ago) Delete P2STR13_SRC00605_Q0325_0000_0-9.log
...
| * 697a103c - (7 months ago) initial commit
* 6567df4b - (5 days ago) add md5 3min
* 6f7c80f7 - (5 days ago) Added md5sum files for all 3min SRCs

I tried doing a git revert 6f7c80f7 , but it shows a lot of conflicts that I don't know how to resolve, that is, when I do a git status :

On branch master
Your branch is up to date with 'origin/master'.

You are currently reverting commit 6f7c80f7.
  (fix conflicts and run "git revert --continue")
  (use "git revert --abort" to cancel the revert operation)

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    deleted:    P2LTR15/.gitkeep
    deleted:    P2LTR15/MOS_P2LTR15_MO.csv
    ... (this is basically a list of almost all the files in the repository)

Unmerged paths:
  (use "git reset HEAD <file>..." to unstage)
  (use "git add/rm <file>..." as appropriate to mark resolution)

    deleted by them: P2LTR15/P2LTR15.yaml
    deleted by them: P2LTR15/audioFrameInformation/P2LTR15_SRC00712_HRC021.afi
    ... (this is another list of files, I don't know why it is there, possibly those were actually changed)

Is there any other way I could rewrite the Git history in such a way that this bad commit never happened?

I've looked at this guide but it also only tells me to do a revert. Should I attempt to go back to a commit before that time, then cherry-pick commits and then force-rewrite the repository? I don't think that will be a nice option either.

Answer 1

You say that your git log output quote is only part of the history, but you also say that commit 6f7c80f7 adds all the files, and in the quoted text, 6f7c80f7 never shows any parent commit to which it connects.

This suggests that 6f7c80f7 is a root commit: an initial commit that, in effect, adds every file that it shows, because the "previous" commit does not exist, so every file must by definition be new in that commit.

A graph can have more than one root. It's a little bit unusual in a Git commit graph since we normally make new commits by first checking out some existing commit, then making changes, then running git commit to save a new complete snapshot. But there are other ways:

We can use git fetch to obtain an unrelated repository, then git merge with --allow-unrelated-histories . If the Git version is old enough, git merge always allows unrelated histories, and does not have an --allow-unrelated-histories flag: it just merges these without warning. (You might see a lot of add/add conflict errors here and have to do a lot of hand resolving though!)
This adds their —the other repository's—root commit to our own commit graph, which already had our root commit, so now we have two root commits.
We can use git checkout --orphan to set our own repository state up so that the next commit we make will be a root commit.
(If and when we do make that root commit, we will have the same merge issues as with the "fetch unrelated repository" case.)
We can, accidentally or deliberately, fiddle with Git internals, using plumbing commands like git commit-tree or just poking at files inside .git , to create a new root commit.

In any case, once we have this state where there are multiple roots and a merge commit that merges the extra root and then have built more commits atop the merge, we are sort of stuck with the situation. The merge commit—in your case, 9a5e2300 —has a hash ID that depends on the existence of both chains of commits: the one leading back to the original root commit, and the one leading back to the new root commit. The hash ID of any commit that has 9a5e2300 as a parent—in your case, this is just the one commit fa837509 —depends on the existence of 9a5e2300 , as it literally incorporates that hash ID into its own ID. The hash ID of the child of fa837509 (again there is just one commit in your graph here) then depends on the existence of fa837509 ; if the rest of the graph is linear, each child in turn depends on its parent ID, all the way to the end.

Hence, if you were to somehow get rid of the extra root commit, your merge commit would need to have a different hash ID, which would mean every subsequent commit would also need to have a different hash ID. Moreover, if you did rid yourself of 6f7c80f7 , the copy you would have to make of its current child 6567df4b would be another root commit! You would have to get rid of that one too, which would mean that there would be nothing for merge commit 9a5e2300 to merge, so you would probably want to ditch that as well—except that you'd want to somehow retain its source snapshot . Then you would need to take all of the remaining (presumably linear) chain, from fa837509 up to the tip, and copy each of those commits to new commits that do not depend on the existence of 9a5e2300 , but do use whatever commit you made that has the updated source snapshot from the merge commit.

Hence, this leads to the process you would have to use to avoid having two root commits:

Copy the snapshot in merge 9a5e2300 to a new non-merge commit. This gets a new hash ID (because it's a new unique commit). Save that ID somewhere.
Copy the snapshot in fa837509 to a new commit, which likewise gets a new hash ID. The new commit's parent would be the ID you just saved. Save this ID somewhere.
Copy the snapshot in the child of fa837509 to a new commit, using the previous copy as the new parent.
Repeat for every commit until you reach the tip.

Once you have this linear chain of commits that do not depend on the commits you want to get rid of, you can then simply stop using the original chain of commits that does depend on the commits you want to get rid of. If you and everyone else do this, and you remove the names by which your Git finds the old chain, then it appears as if that old chain never existed. With the names for them gone, eventually the old commits really do get removed. (This happens when all the safety-checking timer stuff expires. These safety checks exist so that you can get old commits back if you make a mistake but catch it soon enough—typically 30 days—and so that Git can make objects that are not yet connected up into the commit graph, but will be within 14 days. Normal Git commands connect them up within a few milliseconds at worst, so 14 days is practically millennia.)

The above shows that it is possible to do what you are suggesting (though perhaps you would want to construct a slightly different replacement history than the one I described). Whether it's a good idea is another question entirely. The main drawback to rewriting history, however you do it—with git rebase , with git filter-branch , with The BFG Repo Cleaner , or with something you come up with on your own—is that the rewritten repository is, in effect, a new repository. Or at least, the new chain is new: the old stuff, from before the rewrite point, is the same. You must get everyone who has copies of the old repository to switch from the old stuff to the new stuff. If even a single person retains the old commits, and uses Git to fuse the old with the new (which Git is quite happy to do), all the old commits come right back . Now you have the old and the new all squished together into one big happy (?) repository and your problems have just gotten worse!

There is nothing fundamentally wrong with rewriting (some or all of) your commits into new history, you just need to (a) know how to do it and (b) be aware of this "everyone must switch" consequence. If no one else has the commits that you're replacing, this kind of rewrite is completely safe—there's no other person or other Git repository that will bring the old commits back into play. If everyone who does have the old commits understands how this works and plans to cooperate, this kind of history rewrite is mostly safe: it will go wrong only if someone makes a mistake (and then, depending on your own level of carefulness and/or paranoia, maybe only for them!). It's when you have naive users, who just let Git's default Borg-like action of integrating every bit of technology it ever comes across, that this kind of rewrite makes a mess.

How do I revert a specific commit that changed all files, causing lots of conflicts during revert?

Question

1 answers

solution1
2 ACCPTED 2018-04-26 15:09:34

How do I revert a specific commit that changed all files, causing lots of conflicts during revert?

Question

1 answers

solution1 2 ACCPTED 2018-04-26 15:09:34

solution1
2 ACCPTED 2018-04-26 15:09:34