简体   繁体   English

如何从 git 存储库中删除作者?

[英]How do I remove an author from a git repository?

If I create a Git repository and publish it publicly (eg on GitHub etc.), and I get a request from a contributor to the repository to remove or obscure their name for whatever reason, is there a way of doing so easily?如果我创建了一个 Git 存储库并公开发布它(例如在 GitHub 等上),并且我收到来自该存储库贡献者的请求,无论出于何种原因删除或隐藏他们的名字,有没有办法轻松做到这一点?

Basically, I have had such a request and may want to replace their name and e-mail address with something like "Anonymous Contributor" or maybe a SHA-1 hash of their e-mail address or something like that.基本上,我有这样的请求,可能想用“匿名贡献者”之类的东西替换他们的姓名和电子邮件地址,或者他们的电子邮件地址的 SHA-1 哈希或类似的东西。

Jeff is quite right, the right track is git filter-branch. Jeff 说得很对,正确的轨道是 git filter-branch。 It expects a script that plays with the environment variables.它需要一个使用环境变量的脚本。 For your use case, you probably want something like this:对于您的用例,您可能需要这样的东西:

git filter-branch --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
        export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
    fi
    '

You can test that it works like this:您可以测试它的工作方式如下:

$ cd /tmp
$ mkdir filter-branch && cd filter-branch
$ git init
Initialized empty Git repository in /private/tmp/filter-branch/.git/
$ 
$ touch hi && git add . && git commit -m bla
[master (root-commit) 081f7f5] bla
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hi
$ echo howdi >> hi && git commit -a -m bla
[master a466a18] bla
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git log
commit a466a18e4dc48908f7ba52f8a373dab49a6cfee4
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 081f7f50921edc703b55c04654218fe95d09dc3c
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla
$ 
$ git filter-branch --env-filter '
> if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \    
> export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
> fi
> '
Rewrite a466a18e4dc48908f7ba52f8a373dab49a6cfee4 (2/2)
Ref 'refs/heads/master' was rewritten
$ git log
commit 5f0dfc0dc9a325a3f3aaf4575369f15b0fb21fe9
Author: Jon Doe <john@bugmenot.com>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 3cf865fa0a43d2343b4fb6c679c12fc23f7c6015
Author: Jon Doe <john@bugmenot.com>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla

Please beware.请小心。 There's no way to delete the author's name without invalidating all later commit hashes.如果不使所有以后的提交哈希无效,就无法删除作者的姓名。 That will make later merging a pain for people that have been using your repository.这将使以后合并对一直使用您的存储库的人来说是一件痛苦的事情。

If you ever have to "anonymize" a git repo not just for one user, but all users, Git 2.2 (November 2014) provides an interesting feature with the improved and enhanced git fast-export :如果您不得不为一个用户而是所有用户“匿名化”一个 git 存储库,Git 2.2(2014 年 11 月)通过改进和增强的git fast-export提供了一个有趣的功能:

See commit a872275 and commit 75d3d65 by Jeff King ( peff ) :请参阅Jeff King ( peff ) 的commit a872275commit 75d3d65

teach fast-export an --anonymize option:fast-export --anonymize选项:

Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository.有时用户想报告他们在存储库中遇到的错误,但他们不能随意共享存储库的内容。
It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information.如果他们能够生成一个与其历史和树具有相似形状的存储库,但不会泄露任何信息,那将会很有用。
This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem).然后可以与开发人员共享这个“匿名”存储库(假设它仍然复制原始问题)。

This patch implements an " --anonymize " option to fast-export , which generates a stream that can recreate such a repository.此补丁为fast-export实现了一个“ --anonymize ”选项,它生成一个可以重新创建这样的存储库的流。
Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information.生成单个流使调用者可以轻松验证他们没有泄漏任何有用的信息。 You can get an overview of what will be shared by running a command like:您可以通过运行以下命令来大致了解将要共享的内容:

git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less

which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like " User 0 ", and we replace it consistently in the output).这将显示我们生成的每条唯一行,以任何数字为模(每个匿名标记都分配了一个数字,例如“ User 0 ”,我们在输出中一致地替换它)。

In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch , or modifying the output of fast-export yourself)除了匿名化之外,这会产生相对较小(与原始存储库相比)且生成速度较快(与使用filter-branch或自己修改fast-export的输出相比)的测试用例

Doc:文件:

If the --anonymize option is given, git will attempt to remove all identifying information from the repository while still retaining enough of the original tree and history patterns to reproduce some bugs.如果给出了--anonymize选项,git 将尝试从存储库中删除所有标识信息,同时仍然保留足够的原始树和历史模式来重现一些错误。

With this option, git will replace all refnames, paths, blob contents, commit and tag messages, names, and email addresses in the output with anonymized data .使用此选项,git 将使用匿名数据替换输出中的所有引用名称、路径、blob 内容、提交和标记消息、名称和电子邮件地址
Two instances of the same string will be replaced equivalently (eg, two commits with the same author will have the same anonymized author in the output, but bear no resemblance to the original author string).相同字符串的两个实例将被等效替换(例如,具有相同作者的两次提交将在输出中具有相同的匿名作者,但与原始作者字符串没有相似之处)。
The relationship between commits, branches, and tags is +retained, as well as the commit timestamps (but the commit messages and refnames bear no resemblance to the originals).提交、分支和标签之间的关系以及提交时间戳是 +retained(但提交消息和引用名称与原始信息没有相似之处)。
The relative makeup of the tree is retained (eg, if you have a root tree with 10 files and 3 trees, so will the output), but their names and the contents of the files will be replaced.树的相对构成被保留(例如,如果您有一个包含 10 个文件和 3 个树的根树,那么输出也会如此),但它们的名称和文件的内容将被替换。


See also Git 2.28 (Q3 2020), " git fast-export --anonymize " learned to take customized mapping to allow its users to tweak its output more usable for debugging.另请参阅 Git 2.28(2020 年第 3 季度),“ git fast-export --anonymize ”学会了采用自定义映射,以允许其用户调整其输出,使其更适用于调试。

See commit f39ad38 , commit 8a49495 , commit 65b5d9f (25 Jun 2020), and commit d5bf91f , commit 6416a86 , commit 55b0145 , commit a0f6564 , commit 7f40759 , commit 750bb32 , commit b897bf5 , commit b8c0689 (23 Jun 2020) by Jeff King ( peff ) .提交f39ad38提交8a49495提交65b5d9f (2020年6月25日),并提交d5bf91f提交6416a86提交55b0145提交a0f6564提交7f40759提交750bb32提交b897bf5提交b8c0689 (2020年6月23日),由杰夫·金( peff .
(Merged by Junio C Hamano -- gitster -- in commit 0a23331 , 06 Jul 2020) (由Junio C gitster合并-- gitster -- in commit 0a23331 ,2020 年 7 月 6 日)

fast-export : allow seeding the anonymized mapping fast-export :允许播种匿名映射

Helped-by: Eric Sunshine帮助者:Eric Sunshine
Signed-off-by: Jeff King签字人:Jeff King

After you anonymize a repository, it can be hard to find which commits correspond between the original and the result, and thus hard to reproduce commands that triggered bugs in the original.将存储库匿名化后,可能很难找到原始和结果之间对应的提交,因此很难重现触发原始错误的命令。

Let's make it possible to seed the anonymization map.让我们让匿名化地图的种子成为可能。
This lets users either:这让用户可以:

  • mark names to be retained as-is, if they don't consider them secret (in which case their original commands would just work)将名称标记为按原样保留,如果他们不认为它们是秘密的(在这种情况下,它们的原始命令将起作用)
  • map names to new values, which lets them adapt the reproduction recipe to the new names without revealing the originals将名称映射到新值,这使他们可以在不泄露原始名称的情况下将复制配方调整为新名称

The implementation is fairly straight-forward.实现是相当简单的。
We already store each anonymized token in a hashmap (so that the same token appearing twice is converted to the same result).我们已经将每个匿名标记存储在一个哈希图中(这样相同的标记出现两次会转换为相同的结果)。 We can just introduce a new "seed" hashmap which is consulted first.我们可以引入一个新的“种子”哈希图,它首先被咨询。

This does make a few more promises to the user about how we'll anonymize things (eg, token-splitting pathnames).这确实向用户做出了更多关于我们将如何匿名化事物的承诺(例如,令牌分割路径名)。 But it's unlikely that we'd want to change those rules, even if the actual anonymization of a single token changes.但是,即使单个令牌的实际匿名化发生变化,我们也不太可能想要更改这些规则。 And it makes things much easier for the user, who can unblind only a directory name without having to specify each path within it.并且它使用户更容易,他们可以只取消一个目录名,而不必指定其中的每个路径。

One alternative to this approach would be to anonymize as we see fit, and then dump the whole refname and pathname mappings to a file.这种方法的一种替代方法是按照我们认为合适的方式进行匿名化,然后将整个 refname 和路径名映射转储到一个文件中。 This does work, but it's a bit awkward to use (you have to manually dig the items you care about out of the mapping).这确实有效,但使用起来有点尴尬(您必须手动从映射中挖掘您关心的项目)。

git fast-export now have: git fast-export现在有:

--anonymize-map=<from>[:<to>] : --anonymize-map=<from>[:<to>] :

Convert token <from> to <to> in the anonymized output.在匿名输出中将标记<from>转换为<to>
If <to> is omitted, map <from> to itself (ie, do not anonymize it).如果省略<to> ,则将<to> <from>映射到自身(即,不要对其进行匿名化)。

Reproducing some bugs may require referencing particular commits or paths, which becomes challenging after refnames and paths have been anonymized.重现一些错误可能需要引用特定的提交或路径,这在引用名称和路径被匿名化后变得具有挑战性。
You can ask for a particular token to be left as-is or mapped to a new value.您可以要求将特定令牌保持原样或映射到新值。

For example, if you have a bug which reproduces with git rev-list sensitive -- secret.c , you can run:例如,如果您有一个使用git rev-list sensitive -- secret.c重现的错误,您可以运行:

 --------------------------------------------------- $ git fast-export --anonymize --all \\ --anonymize-map=sensitive:foo \\ --anonymize-map=secret.c:bar.c \\ >stream ---------------------------------------------------

After importing the stream, you can then run git rev-list foo -- bar.c in the anonymized repository.导入流后,您可以在匿名存储库中运行git rev-list foo -- bar.c

Note that paths and refnames are split into tokens at slash boundaries.请注意,路径和引用名称在斜杠边界处拆分为标记。
The command above would anonymize subdir/secret.c as something like path123/bar.c ;上面的命令会将subdir/secret.c匿名subdir/secret.c类似path123/bar.c东西; you could then search for bar.c in the anonymized repository to determine the final pathname.然后,您可以在匿名存储库中搜索bar.c以确定最终路径名。

To make referencing the final pathname simpler, you can map each path component;为了使引用最终路径名更简单,您可以映射每个路径组件; so if you also anonymize subdir to publicdir , then the final pathname would be publicdir/bar.c .因此,如果您还将subdir匿名publicdir ,则最终路径publicdir/bar.cpublicdir/bar.c

You can make the change in your local repository, git commit --amend the appropriate commit (where you added the name), and then git push --force to update github with your version of the repository.您可以在本地存储库中进行更改, git commit --amend适当的提交(添加名称的位置),然后git push --force以使用您的存储库版本更新 github。

The original commit with the contributor's name will still be available in the reflog (until it expires, but it would take a lot of effort to find it. If this is a concern, you can obliterate that specific commit from the reflog too -- see git help reflog for the syntax and how to find it in the list.具有贡献者姓名的原始提交仍将在 reflog 中可用(直到它过期,但需要花费很多精力才能找到它。如果这是一个问题,您也可以从 reflog 中删除该特定提交——请参阅git help reflog的语法以及如何在列表中找到它。

If you want to change more than one commit, check out the man page for如果您想更改多个提交,请查看手册页以了解

git filter-branch --env-filter

You can use git-filter-branch to change the content/meta of previous commits.您可以使用 git-filter-branch 来更改先前提交的内容/元。

Note that since you're not dealing with a local branch (it's already been pushed to github), you have no way to remove the author from anyone who has already cloned your branch.请注意,由于您不是在处理本地分支(它已被推送到 github),因此您无法从已经克隆您的分支的任何人中删除作者。

It's also generally bad practice to modify a branch which has already been published, since it can lead to confusion for people who are tracking the branch.修改已经发布的分支通常也是一种不好的做法,因为它会导致跟踪分支的人感到困惑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM