简体   繁体   English

如何从 Git 存储库的提交历史中移除/删除大文件?

[英]How to remove/delete a large file from commit history in the Git repository?

I accidentally dropped a DVD-rip into a website project, then carelessly git commit -a -m... , and, zap, the repo was bloated by 2.2 gigs.我不小心将 DVD-rip 放入网站项目中,然后粗心git commit -a -m... ,然后,zap,repo 膨胀了 2.2 gigs。 Next time I made some edits, deleted the video file, and committed everything, but the compressed file is still there in the repository, in history.下次我做了一些编辑,删除了视频文件,并提交了所有内容,但压缩文件仍然存在于存储库中,在历史记录中。

I know I can start branches from those commits and rebase one branch onto another.我知道我可以从这些提交开始分支并将一个分支变基到另一个分支。 But what should I do to merge the 2 commits so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?但是我应该怎么做才能合并 2 次提交,以便大文件不会显示在历史记录中并在垃圾收集过程中被清除?

Use the BFG Repo-Cleaner , a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.使用BFG Repo-Cleaner ,一种更简单、更快速的git-filter-branch替代方案,专为从 Git 历史记录中删除不需要的文件而设计。

Carefully follow the usage instructions , the core part is just this:仔细按照使用说明进行操作,核心部分就是这样:

$ java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git

Any files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history.任何超过 100MB 的文件(不在您的最新提交中)都将从您的 Git 存储库的历史记录中删除。 You can then use git gc to clean away the dead data:然后,您可以使用git gc清除死数据:

$ git gc --prune=now --aggressive

The BFG is typically at least 10-50x faster than running git-filter-branch , and generally easier to use. BFG 通常至少比运行git-filter-branch 快 10-50倍,并且通常更易于使用。

Full disclosure: I'm the author of the BFG Repo-Cleaner.全面披露:我是 BFG Repo-Cleaner 的作者。

What you want to do is highly disruptive if you have published history to other developers.如果您已将历史发布给其他开发人员,那么您想要做的将是极具破坏性的。 See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps after repairing your history.有关修复历史记录后的必要步骤,请参阅git rebase文档中的“Recovering From Upstream Rebase”

You have at least two options: git filter-branch and an interactive rebase , both explained below.你至少有两个选择: git filter-branch和一个交互式 rebase ,两者都在下面解释。

Using git filter-branch使用git filter-branch

I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository .我对来自 Subversion 导入的庞大二进制测试数据有类似的问题,并写了关于从 git 存储库中删除数据的文章。

Say your git history is:假设你的 git 历史是:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

Note that git lola is a non-standard but highly useful alias.请注意, git lola是一个非标准但非常有用的别名。 (See the addendum at the end of this answer for details.) The --name-status switch to git log shows tree modifications associated with each commit. (有关详细信息,请参阅此答案末尾的附录。) --name-status切换到git log显示与每个提交相关的树修改。

In the “Careless” commit (whose SHA1 object name is ce36c98) the file oops.iso is the DVD-rip added by accident and removed in the next commit, cb14efd.在“Careless”提交(其 SHA1 对象名称为 ce36c98)中,文件oops.iso是意外添加并在下一次提交 cb14efd 中删除的 DVD-rip。 Using the technique described in the aforementioned blog post, the command to execute is:使用上述博客文章中描述的技术,要执行的命令是:

git filter-branch --prune-empty -d /dev/shm/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
  --tag-name-filter cat -- --all

Options:选项:

  • --prune-empty removes commits that become empty ( ie , do not change the tree) as a result of the filter operation. --prune-empty删除由于过滤操作而变为空的提交(,不更改树)。 In the typical case, this option produces a cleaner history.在典型情况下,此选项会产生更清晰的历史记录。
  • -d names a temporary directory that does not yet exist to use for building the filtered history. -d命名一个尚不存在的临时目录以用于构建过滤的历史记录。 If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution .如果您在现代 Linux 发行版上运行,/dev/shm中指定树将导致更快的执行
  • --index-filter is the main event and runs against the index at each step in the history. --index-filter是主要事件,并在历史记录的每个步骤中针对索引运行。 You want to remove oops.iso wherever it is found, but it isn't present in all commits.您想在找到oops.iso任何地方删除它,但它并不存在于所有提交中。 The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.命令git rm --cached -f --ignore-unmatch oops.iso会在 DVD-rip 存在时将其删除,否则不会失败。
  • --tag-name-filter describes how to rewrite tag names. --tag-name-filter描述了如何重写标签名称。 A filter of cat is the identity operation. cat的过滤器是恒等操作。 Your repository, like the sample above, may not have any tags, but I included this option for full generality.您的存储库,就像上面的示例一样,可能没有任何标签,但我包含此选项是为了完全通用。
  • -- specifies the end of options to git filter-branch --指定git filter-branch选项的结束
  • --all following -- is shorthand for all refs. --all following --是所有 refs 的简写。 Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.您的存储库,就像上面的示例一样,可能只有一个 ref(主),但我包含此选项是为了完全通用。

After some churning, the history is now:经过一番翻腾,现在的历史是:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
|
| * f772d66 (refs/original/refs/heads/master) Login page
| | A   login.html
| * cb14efd Remove DVD-rip
| | D   oops.iso
| * ce36c98 Careless
|/  A   oops.iso
|   A   other.html
|
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

Notice that the new “Careless” commit adds only other.html and that the “Remove DVD-rip” commit is no longer on the master branch.请注意,新的“Careless”提交仅添加了other.html ,并且“Remove DVD-rip”提交不再位于 master 分支上。 The branch labeled refs/original/refs/heads/master contains your original commits in case you made a mistake.标记为refs/original/refs/heads/master的分支包含您的原始提交,以防您犯了错误。 To remove it, follow the steps in “Checklist for Shrinking a Repository.”要删除它,请按照“缩小存储库的清单”中的步骤操作。

$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now

For a simpler alternative, clone the repository to discard the unwanted bits.对于更简单的替代方案,克隆存储库以丢弃不需要的位。

$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo

Using a file:///... clone URL copies objects rather than creating hardlinks only.使用file:///...克隆 URL 复制对象而不是仅创建硬链接。

Now your history is:现在你的历史是:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

The SHA1 object names for the first two commits (“Index” and “Admin page”) stayed the same because the filter operation did not modify those commits.前两个提交(“索引”和“管理页面”)的 SHA1 对象名称保持不变,因为过滤操作没有修改这些提交。 “Careless” lost oops.iso and “Login page” got a new parent, so their SHA1s did change. “Careless”丢失了oops.iso ,“Login page”有了新的父级,因此它们的 SHA1确实发生了变化。

Interactive rebase交互式变基

With a history of:有以下历史:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

you want to remove oops.iso from “Careless” as though you never added it, and then “Remove DVD-rip” is useless to you.您想从“Careless”中删除oops.iso ,就好像您从未添加过它一样,然后“删除 DVD-rip”对您毫无用处。 Thus, our plan going into an interactive rebase is to keep “Admin page,” edit “Careless,” and discard “Remove DVD-rip.”因此,我们进入交互式 rebase 的计划是保留“Admin page”,编辑“Careless”,并丢弃“Remove DVD-rip”。

Running $ git rebase -i 5af4522 starts an editor with the following contents.运行$ git rebase -i 5af4522启动一个包含以下内容的编辑器。

pick ce36c98 Careless
pick cb14efd Remove DVD-rip
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

Executing our plan, we modify it to执行我们的计划,我们将其修改为

edit ce36c98 Careless
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
# ...

That is, we delete the line with “Remove DVD-rip” and change the operation on “Careless” to be edit rather than pick .也就是说,我们删除了“Remove DVD-rip”这一行,并将“Careless”上的操作改为edit而不是pick

Save-quitting the editor drops us at a command prompt with the following message.保存退出编辑器会使我们进入命令提示符并显示以下消息。

Stopped at ce36c98... Careless
You can amend the commit now, with

        git commit --amend

Once you are satisfied with your changes, run

        git rebase --continue

As the message tells us, we are on the “Careless” commit we want to edit, so we run two commands.正如消息告诉我们的那样,我们正处于要编辑的“Careless”提交上,因此我们运行了两个命令。

$ git rm --cached oops.iso
$ git commit --amend -C HEAD
$ git rebase --continue

The first removes the offending file from the index.第一个从索引中删除有问题的文件。 The second modifies or amends “Careless” to be the updated index and -C HEAD instructs git to reuse the old commit message.第二个将“Careless”修改或修改为更新后的索引, -C HEAD指示 git 重用旧的提交消息。 Finally, git rebase --continue goes ahead with the rest of the rebase operation.最后, git rebase --continue继续进行其余的 rebase 操作。

This gives a history of:这给出了以下历史:

$ git lola --name-status
* 93174be (HEAD, master) Login page
| A     login.html
* a570198 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

which is what you want.这就是你想要的。

Addendum: Enable git lola via ~/.gitconfig附录:通过~/.gitconfig启用git lola

Quoting Conrad Parker :引用康拉德·帕克的话

The best tip I learned at Scott Chacon's talk at linux.conf.au 2010, Git Wrangling - Advanced Tips and Tricks was this alias:我在 Scott Chacon 在 linux.conf.au 2010 的演讲中学到的最好的技巧,Git Wrangling - Advanced Tips and Tricks 就是这个别名:

 lol = log --graph --decorate --pretty=oneline --abbrev-commit

This provides a really nice graph of your tree, showing the branch structure of merges etc. Of course there are really nice GUI tools for showing such graphs, but the advantage of git lol is that it works on a console or over ssh , so it is useful for remote development, or native development on an embedded board …这提供了一个非常漂亮的树图,显示了合并的分支结构等。当然有非常好的 GUI 工具可以显示这些图,但是git lol的优点是它可以在控制台或ssh上工作,所以它可用于远程开发或嵌入式板上的本地开发……

So, just copy the following into ~/.gitconfig for your full color git lola action:因此,只需将以下内容复制到~/.gitconfig中即可进行全彩git lola操作:

 [alias] lol = log --graph --decorate --pretty=oneline --abbrev-commit lola = log --graph --decorate --pretty=oneline --abbrev-commit --all [color] branch = auto diff = auto interactive = auto status = auto

Why not use this simple but powerful command?为什么不使用这个简单而强大的命令呢?

git filter-branch --tree-filter 'rm -f DVD-rip' HEAD

The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. --tree-filter选项在每次签出项目后运行指定的命令,然后重新提交结果。 In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.在这种情况下,您从每个快照中删除一个名为 DVD-rip 的文件,无论它是否存在。

If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet.如果您知道哪个提交引入了大文件(例如 35dsa2),则可以将 HEAD 替换为 35dsa2..HEAD 以避免重写太多历史记录,从而避免在尚未推送时出现分歧提交。 This comment courtesy of @alpha_989 seems too important to leave out here. @alpha_989 的这条评论似乎太重要了,不能在这里省略。

See this link .请参阅此链接

(The best answer I've seen to this problem is: https://stackoverflow.com/a/42544963/714112 , copied here since this thread appears high in Google search rankings but that other one doesn't) (我看到的这个问题的最佳答案是: https ://stackoverflow.com/a/42544963/714112,复制到这里,因为这个帖子在谷歌搜索排名中看起来很高,但另一个没有)

🚀 A blazingly fast shell one-liner 🚀 🚀 速度极快的外壳单线 🚀

This shell script displays all blob objects in the repository, sorted from smallest to largest.此 shell 脚本显示存储库中的所有 blob 对象,从小到大排序。

For my sample repo, it ran about 100 times faster than the other ones found here.对于我的示例存储库,它的运行速度比此处找到的其他存储库快约 100 倍
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5,622,155 objects in just over a minute .在我信赖的 Athlon II X4 系统上,它只需一分钟多一点就可以处理包含 5,622,155 个对象的Linux 内核存储库

The Base Script基本脚本

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '/^blob/ {print substr($0,6)}' \
| sort --numeric-sort --key=2 \
| cut --complement --characters=13-40 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

When you run above code, you will get nice human-readable output like this:当你运行上面的代码时,你会得到很好的人类可读的输出,如下所示:

...
0d99bb931299  530KiB path/to/some-image.jpg
2ba44098e28f   12MiB path/to/hires-image.png
bd1741ddce0d   63MiB path/to/some-video-1080p.mp4

🚀 Fast File Removal 🚀 🚀快速文件删除🚀

Suppose you then want to remove the files a and b from every commit reachable from HEAD , you can use this command:假设您想从HEAD可访问的每个提交中删除文件ab ,您可以使用以下命令:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' HEAD

After trying virtually every answer in SO, I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again: http://www.zyxware.com/articles/4027/how-to-delete-files-permanently-from-your-local-and-remote-git-repositories在尝试了 SO 中的几乎所有答案之后,我终于找到了这个可以快速删除并删除我的存储库中的大文件并允许我再次同步的 gem: http ://www.zyxware.com/articles/4027/how-to-delete -files-permanently-from-your-local-and-remote-git-repositories

CD to your local working folder and run the following command: CD 到您的本地工作文件夹并运行以下命令:

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch FOLDERNAME" -- --all

replace FOLDERNAME with the file or folder you wish to remove from the given git repository.将 FOLDERNAME 替换为您希望从给定 git 存储库中删除的文件或文件夹。

Once this is done run the following commands to clean up the local repository:完成此操作后,运行以下命令来清理本地存储库:

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

Now push all the changes to the remote repository:现在将所有更改推送到远程存储库:

git push --all --force

This will clean up the remote repository.这将清理远程存储库。

100 times faster than git filter-branch and simpler比 git filter-branch 快 100 倍并且更简单

There are very good answers in this thread, but meanwhile many of them are outdated.此线程中有很好的答案,但同时其中许多已过时。 Using git-filter-branch is no longer recommended, because it is difficult to use and awfully slow on big repositories.不再推荐使用git-filter-branch ,因为它在大型存储库上很难使用并且速度非常慢。

git-filter-repo is much faster and simpler to use. git-filter-repo使用起来更快更简单。

git-filter-repo is a Python script, available at github: https://github.com/newren/git-filter-repo . git-filter-repo是一个 Python 脚本,可在 github 上找到: https ://github.com/newren/git-filter-repo。 When installed it looks like a regular git command and can be called by git filter-repo .安装后,它看起来像一个常规的 git 命令,可以由git filter-repo调用。

You need only one file: the Python3 script git-filter-repo.您只需要一个文件:Python3 脚本 git-filter-repo。 Copy it to a path that is included in the PATH variable.将其复制到 PATH 变量中包含的路径。 On Windows you may have to change the first line of the script (refer INSTALL.md).在 Windows 上,您可能必须更改脚本的第一行(请参阅 INSTALL.md)。 You need Python3 installed installed on your system, but this is not a big deal.您需要在系统上安装 Python3,但这没什么大不了的。

First you can run首先你可以运行

git filter-repo --analyze

This helps you to determine what to do next.这可以帮助您确定下一步要做什么。

You can delete your DVD-rip file everywhere:您可以在任何地方删除您的 DVD-rip 文件:

git filter-repo --invert-paths --path-match DVD-rip
 

Filter-repo is really fast. Filter-repo 真的很快。 A task that took around 9 hours on my computer by filter-branch, was completed in 4 minutes by filter-repo. filter-branch 在我的计算机上花费了大约 9 个小时的任务,通过 filter-repo 在 4 分钟内完成。 You can do many more nice things with filter-repo.你可以用 filter-repo 做更多的好事。 Refer to the documentation for that.请参阅文档。

Warning: Do this on a copy of your repository.警告:在存储库的副本上执行此操作。 Many actions of filter-repo cannot be undone. filter-repo 的许多操作都无法撤消。 filter-repo will change the commit hashes of all modified commits (of course) and all their descendants down to the last commits! filter-repo 将更改所有已修改提交(当然)及其所有后代的提交哈希,直到最后一次提交!

These commands worked in my case:这些命令在我的情况下有效:

git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

It is little different from the above versions.它与上述版本略有不同。

For those who need to push this to github/bitbucket (I only tested this with bitbucket):对于那些需要把它推送到 github/bitbucket 的人(我只用 bitbucket 测试过):

# WARNING!!!
# this will rewrite completely your bitbucket refs
# will delete all branches that you didn't have in your local

git push --all --prune --force

# Once you pushed, all your teammates need to clone repository again
# git pull will not work

According to GitHub Documentation , just follow these steps:根据 GitHub Documentation ,只需按照以下步骤操作:

  1. Get rid of the large file摆脱大文件

Option 1: You don't want to keep the large file:选项 1:您不想保留大文件:

rm path/to/your/large/file        # delete the large file

Option 2: You want to keep the large file into an untracked directory选项 2:您希望将大文件保存到未跟踪的目录中

mkdir large_files                       # create directory large_files
touch .gitignore                        # create .gitignore file if needed
'/large_files/' >> .gitignore           # untrack directory large_files
mv path/to/your/large/file large_files/ # move the large file into the untracked directory
  1. Save your changes保存您的更改
git add path/to/your/large/file   # add the deletion to the index
git commit -m 'delete large file' # commit the deletion
  1. Remove the large file from all commits从所有提交中删除大文件
git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch path/to/your/large/file" \
  --prune-empty --tag-name-filter cat -- --all
git push <remote> <branch>

I ran into this with a bitbucket account, where I had accidentally stored ginormous *.jpa backups of my site.我使用 bitbucket 帐户遇到了这个问题,我不小心在其中存储了我网站的大量 *.jpa 备份。

git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all

Relpace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history ( including tags ). Relpace MY-BIG-DIRECTORY与相关文件夹以完全重写您的历史记录(包括标签)。

source: https://web.archive.org/web/20170727144429/http://naleid.com:80/blog/2012/01/17/finding-and-purging-big-files-from-git-history/来源: https ://web.archive.org/web/20170727144429/http://naleid.com:80/blog/2012/01/17/finding-and-purging-big-files-from-git-history/

Just note that this commands can be very destructive.请注意,此命令可能非常具有破坏性。 If more people are working on the repo they'll all have to pull the new tree.如果有更多的人在进行回购,他们都必须拔出新树。 The three middle commands are not necessary if your goal is NOT to reduce the size.如果您的目标不是减小大小,则不需要三个中间命令。 Because the filter branch creates a backup of the removed file and it can stay there for a long time.因为过滤器分支创建了被删除文件的备份,并且它可以在那里保留很长时间。

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/ 
$ git reflog expire --all 
$ git gc --aggressive --prune
$ git push origin master --force

git filter-branch --tree-filter 'rm -f path/to/file' HEAD worked pretty well for me, although I ran into the same problem as described here , which I solved by following this suggestion . git filter-branch --tree-filter 'rm -f path/to/file' HEAD对我来说工作得很好,尽管我遇到了与这里描述的相同的问题,我按照这个建议解决了这个问题。

The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch /Removing a File from Every Commit section. pro-git 书有一整章是关于重写历史的——看看filter-branch /Removing a File from Every Commit部分。

如果您知道您的提交是最近的,而不是遍历整个树,请执行以下操作: git filter-branch --tree-filter 'rm LARGE_FILE.zip' HEAD~10..HEAD

I basically did what was on this answer: https://stackoverflow.com/a/11032521/1286423我基本上做了这个答案: https ://stackoverflow.com/a/11032521/1286423

(for history, I'll copy-paste it here) (对于历史,我将在这里复制粘贴)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/ 
$ git reflog expire --all 
$ git gc --aggressive --prune
$ git push origin master --force

It didn't work, because I like to rename and move things a lot.它没有用,因为我喜欢重命名和移动很多东西。 So some big file were in folders that have been renamed, and I think the gc couldn't delete the reference to those files because of reference in tree objects pointing to those file.所以一些大文件位于已重命名的文件夹中,我认为 gc 无法删除对这些文件的引用,因为tree对象中的引用指向这些文件。 My ultimate solution to really kill it was to:我真正杀死它的最终解决方案是:

# First, apply what's in the answer linked in the front
# and before doing the gc --prune --aggressive, do:

# Go back at the origin of the repository
git checkout -b newinit <sha1 of first commit>
# Create a parallel initial commit
git commit --amend
# go back on the master branch that has big file
# still referenced in history, even though 
# we thought we removed them.
git checkout master
# rebase on the newinit created earlier. By reapply patches,
# it will really forget about the references to hidden big files.
git rebase newinit

# Do the previous part (checkout + rebase) for each branch
# still connected to the original initial commit, 
# so we remove all the references.

# Remove the .git/logs folder, also containing references
# to commits that could make git gc not remove them.
rm -rf .git/logs/

# Then you can do a garbage collection,
# and the hidden files really will get gc'ed
git gc --prune --aggressive

My repo (the .git ) changed from 32MB to 388KB, that even filter-branch couldn't clean.我的仓库( .git )从 32MB 变为 388KB,甚至过滤器分支都无法清理。

这会将其从您的历史记录中删除

git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch bigfile.txt' --prune-empty --tag-name-filter cat -- --all

Use Git Extensions , it's a UI tool.使用Git Extensions ,它是一个 UI 工具。 It has a plugin named "Find large files" which finds lage files in repositories and allow removing them permenently.它有一个名为“查找大文件”的插件,可以在存储库中查找大型文件并允许永久删除它们。

Don't use 'git filter-branch' before using this tool, since it won't be able to find files removed by 'filter-branch' (Altough 'filter-branch' does not remove files completely from the repository pack files).在使用此工具之前不要使用 'git filter-branch',因为它无法找到被 'filter-branch' 删除的文件(尽管 'filter-branch' 不会从存储库包文件中完全删除文件) .

git filter-branch is a powerful command which you can use it to delete a huge file from the commits history. git filter-branch是一个强大的命令,您可以使用它从提交历史记录中删除一个大文件。 The file will stay for a while and Git will remove it in the next garbage collection.该文件将保留一段时间,Git 将在下一次垃圾回收中将其删除。 Below is the full process from deleteing files from commit history .以下是从提交历史中删除文件的完整过程。 For safety, below process runs the commands on a new branch first.为了安全起见,下面的过程首先在新分支上运行命令。 If the result is what you needed, then reset it back to the branch you actually want to change.如果结果是您需要的,则将其重置回您实际想要更改的分支。

# Do it in a new testing branch
$ git checkout -b test

# Remove file-name from every commit on the new branch
# --index-filter, rewrite index without checking out
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch file-name' HEAD

# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test

# Remove test branch
$ git branch -d test

# Push it with force
$ git push --force origin master

您可以使用branch filter命令执行此操作:

git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD

When you run into this problem, git rm will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.当你遇到这个问题时, git rm是不够的,因为 git 记得该文件在我们的历史中存在过一次,因此会保留对它的引用。

To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space.更糟糕的是,变基也不容易,因为对 blob 的任何引用都会阻止 git 垃圾收集器清理空间。 This includes remote references and reflog references.这包括远程引用和 reflog 引用。

I put together git forget-blob , a little script that tries removing all these references, and then uses git filter-branch to rewrite every commit in the branch.我将git forget-blob放在一起,这是一个尝试删除所有这些引用的小脚本,然后使用 git filter-branch 重写分支中的每个提交。

Once your blob is completely unreferenced, git gc will get rid of it一旦你的 blob 完全未被引用, git gc将摆脱它

The usage is pretty simple git forget-blob file-to-forget .用法很简单git forget-blob file-to-forget You can get more info here你可以在这里获得更多信息

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/ https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

I put this together thanks to the answers from Stack Overflow and some blog entries.感谢 Stack Overflow 的回答和一些博客文章,我把这些放在一起。 Credits to them!归功于他们!

Other than git filter-branch (slow but pure git solution) and BFG (easier and very performant), there is also another tool to filter with good performance:除了git filter-branch (缓慢但纯粹的 git 解决方案)和BFG (更容易且非常高效)之外,还有另一个过滤器具有良好的性能:

https://github.com/xoofx/git-rocket-filter https://github.com/xoofx/git-rocket-filter

From its description:从它的描述来看:

The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features: git-rocket-filter 的目的与命令git-filter-branch类似,同时提供以下独特功能:

  • Fast rewriting of commits and trees (by an order of x10 to x100).快速重写提交和树(按 x10 到 x100 的顺序)。
  • Built-in support for both white-listing with --keep (keeps files or directories) and black-listing with --remove options.内置支持使用 --keep(保留文件或目录)的白名单和使用 --remove 选项的黑名单。
  • Use of .gitignore like pattern for tree-filtering使用类似 .gitignore 的模式进行树过滤
  • Fast and easy C# Scripting for both commit filtering and tree filtering用于提交过滤和树过滤的快速简便的 C# 脚本
  • Support for scripting in tree-filtering per file/directory pattern支持按文件/目录模式在树过滤中编写脚本
  • Automatically prune empty/unchanged commit, including merge commits自动修剪空/未更改的提交,包括合并提交

NEW ANSWER THAT WORKS IN 20222.在 20222 年有效的新答案。

DO NOT USE:不使用:

git filter-branch

this command might not change the remote repo after pushing.推送后此命令可能不会更改远程仓库。 If you clone after using it, you will see that nothing has changed and the repo still has a large size.如果你在使用它之后克隆,你会发现什么都没有改变,并且 repo 仍然有很大的大小。 this command is old now.这个命令现在很旧了。 For example, if you use the steps in https://github.com/18F/C2/issues/439 , this won't work.例如,如果您使用https://github.com/18F/C2/issues/439中的步骤,这将不起作用。

You need to use你需要使用

git filter-repo

Steps:脚步:

(1) Find the largest files in.git: (1)查找.git中最大的文件:

git rev-list --objects --all | grep -f <(git verify-pack -v  .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)

(2) Start filtering these large files: (2) 开始过滤这些大文件:

 git filter-repo --path-glob '../../src/../..' --invert-paths --force

or要么

 git filter-repo --path-glob '*.zip' --invert-paths --force

or要么

 git filter-repo --path-glob '*.a' --invert-paths --force

or whatever you find in step 1.或者您在第 1 步中找到的任何内容。

(3) (3)

 git remote add origin git@github.com:.../...git

(4) (4)

git push --all --force

git push --tags --force

DONE!!!完毕!!!

This works perfectly for me : in git extensions :这对我来说非常有效:在 git 扩展中:

right click on the selected commit :右键单击选定的提交:

reset current branch to here :将当前分支重置到这里:

hard reset ;硬重置;

It's surprising nobody else is able to give this simple answer.令人惊讶的是,没有其他人能够给出这个简单的答案。

将当前分支重置到这里

硬重置

git reset --soft HEAD~1

它将保留更改但删除提交,然后您可以重新提交这些更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM