简体   繁体   English

如何查找master中所有未合并的提交按照创建的分支分组?

[英]How to find all unmerged commits in master grouped by the branches they were created in?

I have to create some code review from unmerged branches. 我必须从未合并的分支机构创建一些代码审查。

In finding solutions, let's not go to local-branch context problem as this will run on a server; 在寻找解决方案时,我们不要去本地分支上下文问题,因为这将在服务器上运行; there will be just the origin remote, I will always run a git fetch origin command before other commands, and when we talk about branches, we will refer to origin/branch-name . 只有原点远程,我会在其他命令之前运行git fetch origin命令,当我们谈论分支时,我们将引用origin / branch-name

If the setup were simple and each branch that originated from master continued on its own way, we could just run: 如果设置很简单,并且每个源自master的分支继续以自己的方式继续,我们可以运行:

git rev-list origin/branch-name --not origin/master --no-merges

for each unmerged branch and add the resulting commits to each review per branch. 对于每个未合并的分支,并将结果提交添加到每个分支的每个审核。

The problem arises when there are merges between 2-3 branches and work is continued on some of them. 当2-3个分支之间存在合并并且其中一些分支继续工作时会出现问题。 As I said, for each branch I want to create code reviews programmatic and I don't want to include a commit in multiple reviews. 正如我所说,对于每个分支,我想创建程序化的代码审查,我不想在多个评论中包含提交。

Mainly the problems reduce on finding the original branch for each commit. 主要是每次提交找到原始分支时出现问题。
Or to put it simpler... finding all unmerged commits grouped by the branch they most probably were created on. 或者更简单一点......找到所有未创建的提交,这些提交按照他们最有可能创建的分支进行分组。

Let's focus on a simple example: 让我们关注一个简​​单的例子:

      *    b4 - branch2's head
   *  |    a4 - branch1's head
   |  *    b3
   *  |    merge branch2 into branch1
*  |\ |    m3 - master's head
|  * \|    a3
|  |  |
|  |  *    b2
|  *  |    merge master into branch1
* /|  |    m2
|/ |  *    merge branch1 into branch2
|  * /|    a2
|  |/ |
|  |  *    b1
|  | /
|  |/
| /|
|/ |
|  *       a1
* /        m1
|/
|
*          start

and what I want to obtain is: 而我想要获得的是:

  • branch1: a1, a2, a3, a4 branch1:a1,a2,a3,a4
  • branch2: b1, b2, b3, b4 branch2:b1,b2,b3,b4

The best solution I found so far is to run: 我到目前为止找到的最佳解决方案是运行:

git show-branch --topo-order --topics origin/master origin/branch1 origin/branch2

and parse the result: 并解析结果:

* [master] m3
 ! [branch1] a4
  ! [branch2] b4
---
  + [branch2] b4
  + [branch2^] b3
 +  [branch1] a4
 ++ [branch2~2] b2
 -- [branch2~3] Merge branch 'branch1' into branch2
 ++ [branch2~4] b1
 +  [branch1~2] a3
 +  [branch1~4] a2
 ++ [branch1~5] a1
*++ [branch2~5] m1

Output interpretation is like this: 输出解释如下:

  1. First n lines are the n branches analyzed n行是分析的n个分支
  2. one line with ---- 一行----
  3. one line for each commit with a plus (or minus in case of merge commits) on the n-th indentation character if that commit is on the n-th branch. 如果该提交位于第n个分支上,则第n个缩进字符的每个提交的一行加上(或者在合并提交的情况下为减号)。
  4. the last line is the merge base for all branches analyzed 最后一行是所有分析分支的合并基础

For point 3. the commit name resolution is starting with a branch name and, from what I see, this branch corresponds to the branch that commits were created on, probably by promoting path reaching by first-parent. 对于第3点,提交名称解析以分支名称开头,从我看到,此分支对应于创建提交的分支,可能是通过促进第一父级到达的路径。

As I'm not interested in merge commits, I'll ignore them. 由于我对合并提交不感兴趣,我会忽略它们。

I'll then parse each branch-path-commit to obtain their hash with rev-parse. 然后我将解析每个branch-path-commit以使用rev-parse获取它们的哈希值。

How can I handle this situation? 我该如何处理这种情况?

The repository could be cloned with --mirror which creates a bare repository that can be used as a mirror of the original repository and can be updated with git remote update --prune after which all the tags should be deleted for this feature. 可以使用--mirror克隆存储库,该存储库创建一个裸存储库,可以将其用作原始存储库的镜像,并且可以使用git remote update --prune进行更新,之后应删除此功能的所有标记。

I implement it this way: 我这样实现它:
1. get a list of branches not merged into master 1.获取未合并为master的分支列表

git branch --no-merged master

2. for each branch get a list of revisions on that branch and not in master branch 2.为每个分支获取该分支上的修订列表,而不是主分支中的修订列表

git rev-list branch1 --not master --no-merges

If the list is empty, remove the branch from the list of branches 如果列表为空,则从分支列表中删除分支
3. for each revision, determine the original branch with 3.对于每个修订版,确定原始分支

git name-rev --name-only revisionHash1

and match regex for ^([^\\~\\^]*)([\\~\\^].*)?$ . 并匹配正则表达式为^([^\\~\\^]*)([\\~\\^].*)?$ The first pattern is the branch name, the second is the relative path to the branch. 第一个模式是分支名称,第二个模式是分支的相对路径。
If the branch name found is not equal to the initial branch, remove revision from the list. 如果找到的分支名称不等于初始分支,请从列表中删除修订。

At the end I obtained a list of branches and for each of them a list of commits. 最后,我获得了一个分支列表,并为每个分支提供了一系列提交。


After some more bash research, it can be done all in one line with: 经过一些更多的bash研究,它可以在一行中完成:

git rev-list --all --not master --no-merges | xargs -L1 git name-rev | grep -oE '[0-9a-f]{40}\s[^\~\^]*'

The result is an output in the form 结果是表单中的输出

hash branch

which can be read, parsed, ordered, group or whatever. 可以读取,解析,排序,分组或其他。

If I grasp your problem space, think you can use --sha1-name 如果我掌握了你的问题空间,可以考虑使用--sha1-name

git show-branch --topo-order --topics --sha1-name origin/master origin/branch1 origin/branch2 git show-branch --topo-order --topics --sha1-name origin / master origin / branch1 origin / branch2

to list what you are interested in, then run the commits through git-what-branch 列出你感兴趣的内容,然后通过git-what-branch运行提交

git-what-branch : Discover what branch a commit is on, or how it got to a named branch. git-what-branch :了解提交的分支,或者它如何到达命名分支。 This is a Perl script from Seth Robertson 这是Seth RobertsonPerl脚本

and format the report to suite your needs? 并格式化报告以满足您的需求?

There is no correct answer to this question because it is underspecified. 这个问题没有正确答案,因为它没有说明。

Git history is simply a directed acyclic graph (DAG), and it's generally impossible to determine semantic relationships between two arbitrary nodes in a DAG unless the nodes are sufficiently labeled. Git历史只是一个有向无环图(DAG),除非节点被充分标记,否则通常不可能确定DAG中两个任意节点之间的语义关系。 Unless you can guarantee that the commit messages in your example graph follow a reliable, machine-parseable pattern, the commits are not sufficiently labeled—it's impossible to automatically identify the commits you are interested in without additional context (eg, guarantees that your developers follow certain best practices). 除非您可以保证示例图中的提交消息遵循可靠的机器可解析模式,否则提交标记不充分 - 如果没有其他上下文,则无法自动识别您感兴趣的提交(例如,保证开发人员遵循某些最佳实践)。

Here's an example of what I mean. 这是我的意思的一个例子。 You say that commit a1 is associated with branch1 , but this can't be determined with certainty just by looking at the nodes of your example graph. 你说提交a1branch1相关联,但仅仅通过查看示例图的节点就无法确定。 It's possible that once upon a time your example repository history looked like this: 您的示例存储库历史可能是这样的:

      *    merge branch1 into branch2 - branch2's head
      |\
     _|/
    / *    b1
   |  |
   |  |
  _|_/
 / |
|  *       a1
* /        m1
|/
|
*          start - master's head

Note that branch1 doesn't even exist yet in the above graph. 请注意, branch1在上图中甚至还不存在。 The above graph could have arisen from the following sequence of events: 上图可能来自以下事件序列:

  1. branch2 is created at start in the shared repository branch2在共享存储库的start处创建
  2. user#1 creates a1 on his/her local branch2 branch 用户#1在他/她的本地branch2分支上创建a1
  3. meanwhile, user#2 creates m1 and b1 on his/her local branch2 branch 同时,用户#2在他/她的本地branch2分支上创建m1b1
  4. user#1 pushes his/her local branch2 branch to the shared repository, causing the branch2 ref in the shared repository to point to a1 用户#1将他/她的本地branch2分支推送到共享存储库,导致共享存储库中的branch2 ref指向a1
  5. user#2 tries to push his/her local branch2 branch to the shared repository, but this fails with a non-fast-forward error ( branch2 currently points to a1 and can't be fast-forwarded to b1 ) 用户#2尝试将他/她的本地branch2分支推送到共享存储库,但是这会因非快进错误而失败( branch2当前指向a1且无法快速转发到b1
  6. user#2 runs git pull , merging a1 into b1 用户#2运行git pull ,将a1合并到b1
  7. user#2 runs git commit --amend -m "merge branch1 into branch2" for some inexplicable reason 用户#2运行git commit --amend -m "merge branch1 into branch2"出于某种莫名其妙的原因
  8. user#2 pushes, and the shared repository history ends up looking like the above DAG 用户#2推送,共享存储库历史记录最终看起来像上面的DAG

Some time later, user#1 creates branch1 off of a1 and creates a2 , while user#2 fast-forward merges m1 into master , resulting in the following commit history: 一段时间后,用户#1从a1创建branch1并创建a2 ,而用户#2快进将m1合并到master ,从而产生以下提交历史记录:

      *    merge a1 into b1 - branch2's head
   *  |\   a2 - branch1's head
   | _|/
   |/ *    b1
   |  |
   |  |
  _|_/
 / |
|  *       a1
* /        m1 - master's head
|/
|
*          start

Given that this sequence of events is technically possible (although unlikely), how can a human let alone Git tell you which commits "belong" to which branch? 鉴于这一系列事件在技术上是可行的(虽然不太可能),人类怎么能更好地告诉你哪些提交“属于”哪个分支?

Parsing Merge Commit Messages 解析合并提交消息

If you can guarantee that users don't change merge commit messages (they always accept the Git default), and that Git has never and will never change the default merge commit message format, then the merge commit's commit message can be used as a clue that a1 started off on branch1 . 如果您可以保证用户不更改合并提交消息(他们总是接受Git默认值),并且Git从未且永远不会更改默认的合并提交消息格式,那么合并提交的提交消息可以用作线索a1branch1开始。 You'll have to write a script to parse the commit messages—there are no simple Git one-liners to do this for you. 你必须编写一个脚本来解析提交消息 - 没有简单的Git单行为你做这个。

If Merges are Always Intentional 如果合并总是故意的

Alternatively, if your developers follow best practices (each merge is intentional and is meant to bring in a differently-named branch, resulting in a repository without those stupid merge commits created by git pull ), and you are not interested in the commits from a completed child branch, then the commits you're interested in are on the first-parent path. 或者,如果您的开发人员遵循最佳实践(每个合并都是有意的,并且意味着引入一个不同名称的分支,从而导致存储库没有git pull创建的那些愚蠢的合并提交 ),并且您对来自的提交不感兴趣。完成子分支,然后您感兴趣的提交在第一个父路径上。 If you know which branch is the parent of the branch you are analyzing, you can do the following: 如果您知道哪个分支是您正在分析的分支的父级,则可以执行以下操作:

git rev-list --first-parent --no-merges parent-branch-ref..branch-ref

This command lists the SHA1 identifiers for the commits that are reachable from branch-ref excluding the commits reachable from parent-branch-ref and the commits that were merged in from child branches. 此命令列出了可从branch-ref访问的提交的SHA1标识符,不包括从parent-branch-ref可到达的提交以及从子分支合并的提交。

In your example graph above, assuming parent order is determined by your annotations and not by the order of the lines going into a merge commit, git rev-list --first-parent --no-merges master..branch1 would print the SHA1 identifiers for commits a4, a3, a2, and a1 (in that order; use --reverse if you want the opposite order), and git rev-list --first-parent --no-merges master..branch2 would print the SHA1 identifiers for commits b4, b3, b2, and b1 (again, in that order). 在上面的示例图中,假设父顺序由您的注释确定,而不是由进入合并提交的行的顺序决定, git rev-list --first-parent --no-merges master..branch1将打印SHA1提交a4,a3,a2和a1的标识符(按顺序;如果你想要相反的顺序,则使用--reverse ), git rev-list --first-parent --no-merges master..branch2将打印提交b4,b3,b2和b1的SHA1标识符(同样,按此顺序)。

If Branches Have Clear Parent/Child Relationships 如果分支机构有明确的父/子关系

If your developers do not follow best practices and your branches are littered with those stupid merges created by git pull (or an equivalent operation), but you have clear parent/child branch relationships, then writing a script to perform the following algorithm may work for you: 如果您的开发人员没有遵循最佳实践,并且您的分支机构充斥着由git pull (或等效操作)创建的那些愚蠢的合并,但您有明确的父/子分支关系,那么编写脚本来执行以下算法可能适用于您:

  1. Find all commits reachable from the branch of interest excluding all commits from its parent branch, its parent's parent branch, its parent's parent's branch, etc., and save the results. 查找从感兴趣的分支可到达的所有提交,不包括来自其父分支,其父代的父分支,其父代的父分支等的所有提交,并保存结果。 For example: 例如:

     git rev-list master..branch1 >commit-list 
  2. Do the same for all child, grandchild, etc. branches of the branch of interest. 为感兴趣的分支的所有子,孙等分支做同样的事情。 For example, assuming branch2 is considered to be a child of branch1 : 例如,假设branch2被认为是branch1的子branch1

     git rev-list ^master ^branch1 branch2 >commits-to-filter-out 
  3. Filter out the results of step #2 from the results of step #1. 从步骤#1的结果中筛选出步骤#2的结果。 For example: 例如:

     grep -Fv -f commits-to-filter-out commit-list 

The trouble with this approach is that once a child branch is merged into its parent, those commits are considered to be part of the parent even if development on the child branch continues. 这种方法的问题在于,一旦子分支合并到其父分支中,即使子分支上的开发仍在继续,这些提交也被视为父分支的一部分。 Although this makes sense semantically, it does not produce the result you say you want. 虽然这在语义上是有意义的,但它不会产生您想要的结果。

Some Best Practices 一些最佳实践

Here are some best practices to make this particular problem easier to solve in the future. 以下是使这一特定问题在未来更容易解决的一些最佳实践。 Most if not all of these can be enforced via clever use of hooks in the shared repository. 大多数(如果不是全部)可以通过在共享存储库中巧妙使用钩子来强制执行。

  1. Only one task per branch. 每个分支只有一个任务。 Multiple tasks are prohibited. 禁止多项任务。
  2. NEVER permit development to continue on a child branch once it has been merged to its parent. 一旦子分支合并到其父分支,就永远不允许继续开发子分支。 Merging implies that a task is done, end of story. 合并意味着任务完成,故事结束。 Answers to anticipated questions: 预期问题的答案:
    • Q: What if I discover a bug in the child branch? 问:如果我发现子分支中存在错误怎么办? A: Start a new branch off of the parent. 答:从父母开始新的分支。 Do NOT continue development on the child branch. 不要继续在子分支上进行开发。
    • Q: What if the new feature isn't done yet? 问:如果新功能尚未完成怎么办? A: Then why did you merge the branch? A:那你为什么要合并分支? Perhaps you merged a complete subtask; 也许你合并了一个完整的子任务; if so, the remaining subtasks should go on their own branches off of the parent branch. 如果是这样,剩余的子任务应该从父分支出发。 Do NOT continue development on the child branch. 不要继续在子分支上进行开发。
  3. Forbid the use of git pull 禁止使用git pull
  4. A child branch must not be merged into its parent unless all of its children branches have been merged into it. 除非已将子分支的所有子分支合并到子分支中,否则不得将子分支合并到其父分支中。
  5. If the branch does not have any children branches, consider rebasing it onto the parent branch before merging with --no-ff . 如果分支没有任何子分支,请考虑在与--no-ff合并之前将其重新定位到父分支 If it does have children branches, you can still rebase, but please preserve the --no-ff merges of the children branches (this is trickier than it should be). 如果它确实有子分支,你仍然可以改变,但请保留子分支的--no-ff合并(这比应该的更复杂)。
  6. Merge the parent branch into the child branch frequently to make merge conflicts easier to resolve. 经常将父分支合并到子分支中以使合并冲突更容易解决。
  7. Avoid merging a grandparent branch directly into its grandchild branch—merge into the child first, then merge the child into the grandchild. 避免将祖父母分支直接合并到其孙子分支中 - 首先合并到孩子中,然后将孩子合并到孙子中。

If all of your developers follow these rules, then a simple: 如果所有开发人员都遵循这些规则,那么简单:

git rev-list --first-parent --no-merges parent-branch..child-branch

is all you need to see the commits that were made on that branch minus the commits made on its children branches. 你需要看到在该分支上进行的提交减去在其子分支上进行的提交。

I would suggest doing it kind of the way you described it. 我建议你按照你所描述的方式进行。 But I would work on the output of git log --format="%H:%P:%s" ^origin/master origin/branch1 origin/branch2 , so you can do better tree-walking. 但我会处理git log --format="%H:%P:%s" ^origin/master origin/branch1 origin/branch2的输出git log --format="%H:%P:%s" ^origin/master origin/branch1 origin/branch2 ,这样你就可以做更好的树行走了。

  1. Build a proper tree structure from the output, marking parents and children. 从输出构建适当的树结构,标记父母和孩子。
  2. Start walking from the heads (get their SHAs from git rev-parse ). 从脑袋开始走路(从git rev-parse获取他们的SHA)。 Mark every commit with the names of the head you came from and its distance. 使用您来自的头部名称及其距离标记每次提交。
    • For not-first-parent steps (the other part of the merge), I would add 100 to the distance. 对于非第一父步骤(合并的另一部分),我会向距离添加100。
    • If you meet a merge commit, check what it says about which branch was merged into which. 如果您遇到合并提交,请检查它所说的合并到哪个分支的内容。 Use this information when following the two parent links: If the parsed name of the branch you are going to does not match your current HEAD, add 10000 to the distance. 在关注两个父链接时使用此信息:如果要分配的分支名称与当前HEAD不匹配,请向距离添加10000。
    • For both of the parents: you now know their name. 对于双方父母:你现在知道他们的名字。 Add all their children that they are first-parent to to a dict: commit -> known-name . 将他们所有的第一个父母的孩子添加到dict: commit -> known-name
  3. Take your dict of known-named commits and start walking up the tree (towards the children, not the parents). 拿你已知命名的提交并开始走上树(朝向孩子,而不是父母)。 Substract 10000 from the distance from the merged-into branch. 从合并到分支的距离减去10000。 While doing this walk to not go to commits that you are not first-parent to and stop as soon as you hit a branch-point (a commit that has two children). 在做这个步骤时,不要去提交你不是第一个父母,并且一旦你点击一个分支点(一个有两个孩子的提交)就停止。 Also stop if you hit one of your branch-heads. 如果你碰到了一个分支头,也要停下来。

Now for each of your commits, you will have a list of distance values (that might be negative) to your branch heads. 现在,对于每个提交,您将获得分支头的距离值列表(可能是负数)。 For each commit, the branch with the least distance is the one the commit was most likely created on. 对于每次提交,距离最小的分支是最有可能创建提交的分支。

If you have time, you might want to walk the whole history and then substract the history of master – that might give slightly better results if your branches have been merged into master before. 如果你有时间,你可能想要遍历整个历史记录然后减去master的历史记录 - 如果你的分支之前已经合并到master中,那么可能会给出更好的结果。


Couldn't resist: Made a python script that does what I described. 无法抗拒:做了一个蟒蛇脚本,完成了我所描述的。 But with one change: with every normal step, the distance is not increased, but decreased. 但是有一个变化:每个正常步骤,距离不会增加,而是会减少。 This has the effect that branches that lived longer after a merge-point are preferred, which I personally like more. 这样的结果是合并点之后的分支更长,我个人更喜欢这种分支。 Here it is: https://gist.github.com/Chronial/5275577 这是: https//gist.github.com/Chronial/5275577

Usage: simply run git-annotate-log.py ^origin/master origin/branch1 origin/branch2 check the quality of the results (will output a git log tree with annotations). 用法:只需运行git-annotate-log.py ^origin/master origin/branch1 origin/branch2检查结果的质量(将输出带注释的git日志树)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM