简体   繁体   English

为什么 Git 在检查提交时合并索引文件而不是完全覆盖它

[英]Why does Git merge the index file instead of completely overriding it when checking out a commit

This is the question about Git internals.这是关于 Git 内部的问题。 I use low-level commands here and don't use branches.我在这里使用低级命令并且不使用分支。

Setup设置

echo f1 > f1.txt
echo f1 > f1.txt
git add .
git commit -m "first"
...cf178d5

Now I want to create a new commit with one file f3.txt using index and write-tree command:现在我想使用 index 和write-tree命令创建一个包含一个文件f3.txt的新提交:

$ rm f1.txt f2.txt
$ echo ‘f3 content’ > f3.txt
$ git add .

So currently the index file and the directory contains only new f3.txt file:所以目前索引文件和目录只包含新的f3.txt文件:

$ git ls-files -s
100644 [some hash] 0       f3.txt

$ ls
f3.txt

This what causes the weird behavior later这就是后来导致奇怪行为的原因

So I write the tree to the repository and update HEAD so with the new commit hash:因此,我将树写入存储库并使用新的提交哈希更新 HEAD:

LATEST_TREE_HASH=$( git write-tree )
echo $LATEST_TREE_HASH > .git/HEAD

If I now run git status I get:如果我现在运行git status我得到:

$ git status
Not currently on any branch.
nothing to commit, working directory clean

Question问题

If I now check out first commit with two files f1.txt and f2.txt :如果我现在用两个文件f1.txtf2.txt检查first提交:

$ git checkout cf178d5
A       f3.txt                    <--------------- why?
HEAD is now at a27a75a... initial commit

Git works fine but I believe it merges trees in the index instead of overriding. Git 工作正常,但我相信它合并索引中的树而不是覆盖。 You can see it from the git checkout output that it treats f3.txt as added file and if I check the index file contents:您可以从git checkout输出中看到它将f3.txt视为添加文件,如果我检查索引文件内容:

$ git ls-files -s
100644 [some hash] 0       f1.txt
100644 [some hash] 0       f2.txt
100644 [some hash] 0       f3.txt

$ ls 
f1.txt f2.txt f3.txt

It shows three files.它显示三个文件。 What is the reason for this behavior?这种行为的原因是什么?

Edit : the question has changed enough to invalidate the previous response.编辑:问题已经改变到足以使之前的回复无效。

There's still a typo ( f1.txt listed twice) and funky non-ASCII Unicode quote marks, but we can now see what is going wrong here:仍然有一个错字( f1.txt列出了两次)和时髦的非 ASCII Unicode 引号,但我们现在可以看到这里出了什么问题:

 $ LATEST_TREE_HASH=$( git write-tree ) $ echo $LATEST_TREE_HASH > .git/HEAD

This is a bit of a problem.这有点问题。 As Mark Adelsberger noted in a comment and your script says by using the word TREE here, git write-tree writes a tree , not a commit.正如Mark Adelsberger 在评论中指出的那样,您的脚本在这里使用TREE一词表示, git write-tree写的是一棵树,而不是提交。

Why this is a problem为什么这是一个问题

What's in .git/HEAD is supposed to be exactly one of two things: .git/HEAD内容应该是以下两件事之一:

  • a string of the form ref: refs/heads/ name , where name is a valid branch name, or ref: refs/heads/ name形式的字符串,其中name是有效的分支名称,或
  • the hash ID of a commit object .提交对象的哈希 ID。

In turn, a branch name—a reference of the form refs/heads/ namemust always point to a commit object, never to a blob, tree, or tag object.反过来,分支名称( refs/heads/ name形式的引用)必须始终指向提交对象,而不能指向 blob、树或标记对象。

This means that Git in general assumes that whatever comes out of .git/HEAD , it refers to a commit object.这意味着 Git 通常假设无论来自.git/HEAD内容都指向一个提交对象。 By writing this tree hash into .git/HEAD you've violated this assumption.通过将此树哈希写入.git/HEAD您违反了这个假设。 However, to allow for "unborn branches", such as the state of an initial repository with no master yet, HEAD can contain the name of a branch that does not actually exist.但是,为了允许“未出生的分支”,例如还没有master的初始存储库的状态, HEAD可以包含实际不存在的分支的名称

What happens next is, I think, not guaranteed.接下来会发生什么,我认为,不能保证。 The git checkout command assumes that if HEAD contains a valid hash, it contains a commit hash, and the only other allowed possibility is that HEAD contains the name of an orphan branch. git checkout命令假设如果HEAD包含一个有效的哈希,它包含一个提交哈希,唯一允许的另一个可能性是HEAD包含一个孤立分支的名称。 So we run git checkout target_hash , as in your example:因此,我们运行git checkout target_hash ,如您的示例所示:

 git checkout cf178d5

Case 1: moving from commit to commit案例 1:从提交移动到提交

Suppose HEAD contained a valid commit hash.假设HEAD包含一个有效的提交哈希。 Let's call this the old hash, as distinguished from the target commit hash.让我们称其为哈希,以区别于目标提交哈希。 In this case, git checkout would:在这种情况下, git checkout将:

  • Compare (recursively as needed for sub-trees) the contents of the tree of old to the contents of the tree of target .比较(根据子树的需要递归)树的内容与目标树的内容。 1 1
  • For each hash that must change, including being added or removed, check whether the index and/or work-tree file version in the current index and work-tree match those in old .对于每个必须更改的哈希,包括添加或删除,检查当前索引和工作树中的索引和/或工作树文件版本是否与old 中的匹配。
  • If all match, update the index hashes and copy the new files to the work-tree (or remove the files from the work-tree and remove the index entry, if appropriate).如果全部匹配,则更新索引哈希并将新文件复制到工作树(或从工作树中删除文件并删除索引条目,如果合适)。
  • Otherwise (some files don't match): complain and refuse the checkout.否则(某些文件不匹配):投诉并拒绝结帐。

Obviously --force disables the check, but this is the basic process by which both staged and unstaged modifications are carried from one checkout to another when switching branches without being in a "clean" state .显然--force会禁用检查,但这是在切换分支而不处于“干净”状态时,将暂存和未暂存修改从一个检出进行到另一个检出的基本过程 The process is described in all its gory detail in the Two Tree Merge section of the git read-tree documentation . git read-tree文档两棵树合并部分详细描述了该过程。

Case 2: moving from orphan branch to commit案例 2:从孤立分支移动到提交

The other possibility allowed by the rules is that you are currently on an orphan branch.规则允许的另一种可能性是您目前在孤儿分支上。 In this case, there is no current commit.在这种情况下,当前没有提交。 Most likely, Git simply uses the empty tree as if it were the current commit.最有可能的是,Git 只是使用空树,就好像它是当前提交一样。 It then follows the same rules for case 1, which is now allowed since it has a tree.然后它遵循与案例 1 相同的规则,现在允许使用,因为它有一个树。

But this is, obviously, not guaranteed.但这显然不能保证。 If Git were to use the current (valid) tree stored in .git/HEAD as the base tree, instead of the empty tree, and then proceed as for case 1, you'd see your two files get removed.如果 Git 使用存储在.git/HEAD的当前(有效)作为基础树,而不是空树,然后按照案例 1 进行操作,您会看到您的两个文件被删除。 Follow all the sub-cases outlined in git read-tree with $H set to your existing tree, vs $H set to the empty tree.遵循git read-tree列出的所有子案例,其中$H设置为现有树,而$H设置为空树。 (I admit to not having done so, but I think this is where the behavior comes from. But see also the remark about case 3 in the read-tree documentation!) (我承认没有这样做,但我认为这是行为的来源。但另请参阅 read-tree 文档中关于 case 3 的评论!)


1 Git actually achieves this using a temporary index , stored in the index.lock file. 1 Git 实际上使用临时索引来实现这一点,它存储在index.lock文件中。 If all goes well, the temporary index is renamed to become the regular index, unlocking the index in the process.如果一切顺利,临时索引被重命名为常规索引,在此过程中解锁索引。 If things go poorly, Git removes the temporary index.lock file, discarding the temporary index and unlocking the index.如果事情进展不顺利,Git 会删除临时index.lock文件,丢弃临时索引并解锁索引。

Original answer (to somewhat different question)原始答案(对有些不同的问题)

There's another set of funky non-ASCII quote marks that made cut and paste of your instructions fail, so that when I made the normal first commit I ended up with two files:还有一组时髦的非 ASCII 引号使您的指令的剪切和粘贴失败,因此当我进行正常的第一次提交时,我最终得到了两个文件:

$ git commit -m "second commit"
[master (root-commit) b9c7e4b] second commit
 2 files changed, 1 insertion(+)
 create mode 100644 f3.txt
 create mode 100644 f3.txtcontentecho
$ ls
f3.txt                  f3.txtcontentecho
$ git ls-files -s
100644 5927d85c2470d49403f56ce27afd8f74b1a42589 0       f3.txt
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       f3.txtcontentecho

But note that my ls vs ls-files -s output differs enormously from yours at this point:但请注意,此时我的ls vs ls-files -s输出与您的有很大不同:

So currently the index file and the directory contains only new f3.txt file:所以目前索引文件和目录只包含新的 f3.txt 文件:

 $ git ls-files -s 100644 [some hash] 0 f3.txt $ ls f1.txt f2.txt

It's not at all clear to me why you would have files f1.txt and f2.txt in your work-tree now;我完全不清楚为什么现在你的工作树中会有文件f1.txtf2.txt I don't.我不知道。

Now we create a commit with git commit-tree and run git checkout :现在我们使用git commit-tree创建一个提交并运行git checkout

$ INITIAL_COMMIT_HASH=$( \
>     echo 'initial commit' | git commit-tree $INITIAL_TREE_HASH )
$ git checkout $INITIAL_COMMIT_HASH

but what I get is very different:但我得到的是非常不同的:

Note: checking out 'cd1bc16160c8a2814cd94bc8397230ffe5a16c22'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at cd1bc16... initial commit

and everything reads as I would expect (files f1.txt and f2.txt are in the work-tree and the index; neither of the f3 files are visible).一切都按我的预期读取(文件f1.txtf2.txt在工作树和索引中; f3文件都不可见)。 Running git log --graph --all shows the expected two (disconnected) commits (both are root commits, with no parents).运行git log --graph --all显示预期的两个(断开连接的)提交(都是根提交,没有父提交)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM