[英]Remove sensitive files and their commits from Git history
I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).我想在 GitHub 上放置一个 Git 项目,但它包含某些带有敏感数据的文件(用户名和密码,如 capistrano 的 /config/deploy.rb)。
I know I can add these filenames to .gitignore , but this would not remove their history within Git.我知道我可以将这些文件名添加到.gitignore ,但这不会删除它们在 Git 中的历史记录。
I also don't want to start over again by deleting the /.git directory.我也不想通过删除 /.git 目录重新开始。
Is there a way to remove all traces of a particular file in your Git history?有没有办法删除 Git 历史记录中特定文件的所有痕迹?
For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS!出于所有实际目的,您应该担心的第一件事是更改您的密码! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet;
从您的问题中不清楚您的 git 存储库是完全本地的还是其他地方是否有远程存储库; if it is remote and not secured from others you have a problem.
如果它是远程的并且不受其他人的保护,那么您就会遇到问题。 If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history.
如果有人在您修复此问题之前克隆了该存储库,他们将在其本地计算机上拥有您密码的副本,并且您无法强制他们更新到您的“已修复”版本,因为它已从历史记录中消失。 The only safe thing you can do is change your password to something else everywhere you've used it.
您可以做的唯一安全的事情是将您的密码更改为您使用过的任何其他地方。
With that out of the way, here's how to fix it.有了这个,这里是如何解决它。 GitHub answered exactly that question as an FAQ :
GitHub 在 FAQ 中准确地回答了这个问题:
Note for Windows users : use double quotes (") instead of singles in this command Windows 用户注意事项:在此命令中使用双引号 (") 而不是单引号
git filter-branch --index-filter \
'git update-index --remove PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' <introduction-revision-sha1>..HEAD
git push --force --verbose --dry-run
git push --force
Update 2019: 2019 年更新:
This is the current code from the FAQ:这是常见问题解答中的当前代码:
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \
--prune-empty --tag-name-filter cat -- --all
git push --force --verbose --dry-run
git push --force
Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history.请记住,一旦您将此代码推送到 GitHub 等远程存储库并且其他人克隆了该远程存储库,您现在就处于重写历史记录的情况。 When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.
当其他人在此之后尝试下拉您的最新更改时,他们会收到一条消息,指示无法应用更改,因为它不是快进。
To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage .要解决此问题,他们必须删除现有存储库并重新克隆它,或者按照git-rebase 联机帮助页中“从上游重新数据库恢复”下的说明进行操作。
Tip : Execute git rebase --interactive
提示:执行
git rebase --interactive
In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes.将来,如果您不小心提交了一些带有敏感信息的更改,但您在推送到远程存储库之前注意到了,那么有一些更简单的修复方法。 If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:
如果你最后一次提交是添加敏感信息,你可以简单地删除敏感信息,然后运行:
git commit -a --amend
That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm
.这将使用您所做的任何新更改来修改之前的提交,包括使用
git rm
完成的整个文件删除。 If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:如果更改进一步追溯到历史但仍未推送到远程存储库,您可以执行交互式 rebase:
git rebase -i origin/master
That opens an editor with the commits you've made since your last common ancestor with the remote repository.这将打开一个编辑器,其中包含自您与远程存储库的最后一个共同祖先以来所做的提交。 Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit.
在任何代表带有敏感信息的提交的行上将“pick”更改为“edit”,然后保存并退出。 Git will walk through the changes, and leave you at a spot where you can:
Git 将完成更改,并将您留在一个地方,您可以:
$EDITOR file-to-fix
git commit -a --amend
git rebase --continue
For each change with sensitive information.对于带有敏感信息的每个更改。 Eventually, you'll end up back on your branch, and you can safely push the new changes.
最终,您将返回到您的分支,并且您可以安全地推送新的更改。
Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner , a faster, simpler alternative to git-filter-branch
explicitly designed for removing private data from Git repos.更改密码是一个好主意,但是对于从存储库历史记录中删除密码的过程,我建议使用BFG Repo-Cleaner ,它是
git-filter-branch
的更快、更简单的替代方案,明确设计用于从 Git 存储库中删除私有数据。
Create a private.txt
file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:创建一个
private.txt
文件,列出您要删除的密码等(每行一个条目),然后运行以下命令:
$ java -jar bfg.jar --replace-text private.txt my-repo.git
All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***".将扫描存储库历史记录中低于阈值大小(默认为 1MB)的所有文件,并且任何匹配的字符串(不在您的最新提交中)将替换为字符串“***REMOVED***”。 You can then use
git gc
to clean away the dead data:然后您可以使用
git gc
清除死数据:
$ git gc --prune=now --aggressive
The BFG is typically 10-50x faster than running git-filter-branch
and the options are simplified and tailored around these two common use-cases: BFG 通常比运行
git-filter-branch
快 10-50 倍,并且选项围绕这两个常见用例进行了简化和定制:
Full disclosure: I'm the author of the BFG Repo-Cleaner.完全披露:我是 BFG Repo-Cleaner 的作者。
If you pushed to GitHub, force pushing is not enough, delete the repository or contact support如果你推送到 GitHub,强制推送是不够的,删除仓库或联系支持
Even if you force push one second afterwards, it is not enough as explained below.即使您在之后强推一秒钟,也不够,如下所述。
The only valid courses of action are:唯一有效的行动方案是:
is what leaked a changeable credential like a password?是什么泄露了像密码这样的可变凭证?
yes: modify your passwords immediately, and consider using more OAuth and API keys!是的:立即修改您的密码,并考虑使用更多的 OAuth 和 API 密钥!
no (naked pics):不(裸照):
do you care if all issues in the repository get nuked?您是否关心存储库中的所有问题是否都被破坏了?
no: delete the repository否:删除存储库
yes:是的:
Force pushing a second later is not enough because:强推一秒钟是不够的,因为:
GitHub keeps dangling commits for a long time. GitHub 长时间保持悬空提交。
GitHub staff does have the power to delete such dangling commits if you contact them however.但是,如果您与 GitHub 员工联系,他们确实有权删除此类悬空提交。
I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc
.当我将所有 GitHub 提交电子邮件上传到一个 repo时,我亲身体验了这一点,他们要求我将其删除,所以我做了,他们做了一个
gc
。 Pull requests that contain the data have to be deleted however : that repo data remained accessible up to one year after initial takedown due to this. 但是,必须删除包含数据的拉取请求:由于此原因,在初始删除后一年内,该存储库数据仍然可以访问。
Dangling commits can be seen either through:悬空提交可以通过以下任一方式看到:
One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, eg: https://github.com/cirosantilli/myrepo/archive/SHA.zip在提交时获取源代码的一种便捷方法是使用下载 zip 方法,该方法可以接受任何参考,例如: https : //github.com/cirosantilli/myrepo/archive/SHA.zip
It is possible to fetch the missing SHAs either by:可以通过以下方式获取丢失的 SHA:
type": "PushEvent"
. Eg mine: https://api.github.com/users/cirosantilli/events/public ( Wayback machine )type": "PushEvent"
API 事件type": "PushEvent"
。例如我的: https type": "PushEvent"
( Wayback machine ) There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.有像http://ghtorrent.org/和https://www.githubarchive.org/这样的爬虫程序,它们定期汇集 GitHub 数据并将其存储在其他地方。
I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.我找不到他们是否抓取了实际的提交差异,这不太可能,因为数据太多,但技术上是可能的,而且 NSA 和朋友们可能有过滤器来只存档与人或感兴趣的提交相关的内容。
If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, eg https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.但是,如果您删除存储库而不是强制推送,则即使从 API 中提交也会立即消失并给出 404,例如https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a382即使您重新创建另一个具有相同名称的存储库。
To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:为了测试这一点,我创建了一个 repo: https : //github.com/cirosantilli/test-dangling并做了:
git init
git remote add origin git@github.com:cirosantilli/test-dangling.git
touch a
git add .
git commit -m 0
git push
touch b
git add .
git commit -m 1
git push
touch c
git rm b
git add .
git commit --amend --no-edit
git push -f
See also: How to remove a dangling commit from GitHub?另请参阅: 如何从 GitHub 中删除悬空提交?
git filter-repo
is now officially recommended over git filter-branch
现在正式推荐
git filter-repo
超过git filter-branch
This is mentioned in the manpage of git filter-branch
in Git 2.5 itself.这在 Git 2.5 本身的
git filter-branch
页中提到。
With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history使用 git filter repo,您可以删除某些文件: 从 git/GitHub 的历史记录中删除文件夹及其内容
pip install git-filter-repo
git filter-repo --path path/to/remove1 --path path/to/remove2 --invert-paths
This automatically removes empty commits.这会自动删除空提交。
Or you can replace certain strings with: How to replace a string in a whole Git history?或者您可以将某些字符串替换为: 如何替换整个 Git 历史记录中的字符串?
git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')
I recommend this script by David Underhill, worked like a charm for me.我推荐大卫安德希尔的这个剧本,对我来说就像一个魅力。
It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:除了 natacado 的过滤器分支之外,它还添加了这些命令来清理它留下的混乱:
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
Full script (all credit to David Underhill)完整剧本(全部归功于大卫安德希尔)
#!/bin/bash
set -o errexit
# Author: David Underhill
# Script to permanently delete files/folders from your git repository. To use
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
if [ $# -eq 0 ]; then
exit 0
fi
# make sure we're at the root of git repo
if [ ! -d .git ]; then
echo "Error: must run this script from the root of a git repository"
exit 1
fi
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $files" HEAD
# remove the temporary history git-filter-branch
# otherwise leaves behind for a long time
rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune
The last two commands may work better if changed to the following:如果更改为以下最后两个命令可能会更好地工作:
git reflog expire --expire=now --all && \
git gc --aggressive --prune=now
You can use git forget-blob
.您可以使用
git forget-blob
。
The usage is pretty simple git forget-blob file-to-forget
.用法非常简单
git forget-blob file-to-forget
。 You can get more info here你可以在这里获得更多信息
https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/ https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/
It will disappear from all the commits in your history, reflog, tags and so on它将从您的历史记录、引用日志、标签等中的所有提交中消失
I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.我不时遇到同样的问题,每次我必须回到这篇文章和其他文章时,这就是我自动化流程的原因。
Credits to contributors from Stack Overflow that allowed me to put this together感谢 Stack Overflow 的贡献者,让我把它们放在一起
Here is my solution in windows这是我在 Windows 中的解决方案
git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD
git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD
git push --force
git push --force
make sure that the path is correct otherwise it won't work确保路径正确,否则将无法工作
I hope it helps我希望它有帮助
Use filter-branch :使用过滤器分支:
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all
git push origin *branch_name* -f
To be clear: The accepted answer is correct.需要明确的是:接受的答案是正确的。 Try it first.
先试试。 However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.
但是,对于某些用例,它可能会不必要地复杂,特别是如果您遇到令人讨厌的错误,例如“致命:错误修订 --prune-empty”,或者真的不关心您的仓库的历史记录。
An alternative would be:另一种选择是:
This will of course remove all commit history branches, and issues from both your github repo, and your local git repo.这当然会从你的 github 仓库和你的本地 git 仓库中删除所有提交历史分支和问题。 If this is unacceptable you will have to use an alternate approach.
如果这是不可接受的,您将不得不使用替代方法。
Call this the nuclear option.称之为核选项。
I've had to do this a few times to-date.迄今为止,我不得不这样做了几次。 Note that this only works on 1 file at a time.
请注意,这一次仅适用于 1 个文件。
Get a list of all commits that modified a file.获取修改文件的所有提交的列表。 The one at the bottom will the the first commit:
底部的将是第一次提交:
git log --pretty=oneline --branches -- pathToFile
To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:要从历史记录中删除文件,请使用第一个提交 sha1 和上一个命令中的文件路径,并将它们填充到此命令中:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..
In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder.在我的 android 项目中,我将admob_keys.xml作为单独的 xml 文件放在app/src/main/res/values/文件夹中。 To remove this sensitive file I used below script and worked perfectly.
为了删除这个敏感文件,我使用了下面的脚本并且工作得很好。
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch app/src/main/res/values/admob_keys.xml' \
--prune-empty --tag-name-filter cat -- --all
So, It looks something like this:所以,它看起来像这样:
git rm --cached /config/deploy.rb
echo /config/deploy.rb >> .gitignore
Remove cache for tracked file from git and add that file to
.gitignore
list从 git 中删除跟踪文件的缓存并将该文件添加到
.gitignore
列表
filter-branch
commandfilter-branch
命令Example:例子:
git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch *file_relative_path*' --prune-empty --tag-name-filter cat -- --all
The terms used above are:上面使用的术语是:
--prune-empty
: If you just want to prune commits that become empty then you don't even need to specify this flag. --prune-empty
:如果您只想修剪变为空的提交,那么您甚至不需要指定此标志。 If you want to prune commits that started empty in your repo, then you need to specify --prune-empty always.--tag-name-filter <command>
: If you are just specifying --tag-name-filter cat
, then the correct translation is to specify no extra flags. --tag-name-filter <command>
:如果您只是指定--tag-name-filter cat
,那么正确的翻译是不指定额外的标志。 The fact that filter-branch
required that was evidence that it was retarded; filter-branch
要求的事实是它被延迟的证据; it should have been handled automatically.--tag-rename
option.) --tag-rename
选项。)-- --all
: This was another piece of evidence that filter-branch
was retarded in making users specify things that should have just been the default. -- --all
:这是另一个证据,表明filter-branch
在让用户指定本来应该是默认值的东西方面受到了阻碍。 Just drop it.--index-filter <command>
: This is the filter for rewriting the index. --index-filter <command>
:这是用于重写索引的过滤器。 It is similar to the tree filter but does not check out the tree, which makes it much faster.--invert-paths
.--invert-paths
。filter-repo
commandfilter-repo
命令git filter-repo
is now recommended by the git project instead of git filter-branch
since filter-branch
is extremely slow (multiple orders of magnitude slower than it should be) for non-trivial repositories. git
filter-repo
现在被 git 项目推荐,而不是 git filter-branch
因为filter-branch
非常慢(比它应该慢多个数量级)。
Example:例子:
git filter-repo --path *file_relative_path* --invert-paths
The (only) term here is:这里的(唯一)术语是:
--invert-paths
: Invert the selection of files from the specified --path-{match,glob,regex}
options, ie only select files matching none of those options. --invert-paths
:从指定的--path-{match,glob,regex}
选项反转文件选择,即仅 select 文件与这些选项都不匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.