简体   繁体   English

如何使用 BFG Repo-Cleaner

[英]How to use BFG Repo-Cleaner

I've been advised to use the BFG Repo-Cleaner as my local repo that I want to push contains files too large to push to GitHub.有人建议我使用 BFG Repo-Cleaner 作为我要推送的本地存储库,其中包含太大而无法推送到 GitHub 的文件。 These files (above about 50MB) I don't mind if they get deleted and I accidentally committed them a while back.这些文件(大约 50MB 以上)我不介意它们是否被删除并且我不小心将它们提交了一段时间。

On the online instructions: https://rtyley.github.io/bfg-repo-cleaner/网上说明: https://rtyley.github.io/bfg-repo-cleaner/

It suggests I should clone a fresh copy of my repo using the --mirror flag (this is seemingly an online version, not the local version).它建议我应该使用 --mirror 标志克隆我的 repo 的新副本(这似乎是一个在线版本,而不是本地版本)。 Then to do the Java -jar bfg.jar... command.然后执行 Java -jar bfg.jar... 命令。 And following this to cd back into that local mirror copy of the online repo, and then to push the information back.然后按照 cd 回到在线 repo 的本地镜像副本,然后将信息推送回来。

I don't quite understand how this applies for local copies.我不太明白这如何适用于本地副本。 For local copies that are too big to push should I eg do:对于太大而无法推送的本地副本,我应该这样做:

git clone --mirror /Users/me/myrepo git 克隆 --mirror /Users/me/myrepo

java -jar bfg.jar --strip-blobs-bigger-than 100M /Users/me/myrepomirror.git java -jar bfg.jar --strip-blobs-big-than 100M /Users/me/myrepomirror.ZBA9F11ECC3497D9993Z533FDC2BD6E

Then I don't also understand how the next steps:然后我也不明白接下来的步骤如何:

cd /Users/me/myrepomirror.git git reflog expire --expire=now --all && git gc --prune=now --aggressive git push cd /Users/me/myrepomirror.git git reflog expire --expire=now --all && git gc --prune=now --aggressive git push

would address anything to do with my non-mirrored local repo:将解决与我的非镜像本地存储库有关的任何问题:

/Users/me/myrepo /用户/我/myrepo

I am not sure if they imply that I should then do after this:我不确定他们是否暗示我应该在此之后做:

java -jar bfg.jar --strip-blobs-bigger-than 50M my-repo.git java -jar bfg.jar --strip-blobs-big-than 50M my-repo.git

And again I do not know how this addresses the actual repo (not a mirror or an online version) that I want to prune so that I can push it.而且我不知道这如何解决我想要修剪的实际回购(不是镜像或在线版本)以便我可以推送它。

Perhaps I am being a bit dull?也许我有点沉闷? The documentation doesn't seem very explicit/extensive for something so potentially useful.对于可能有用的东西,该文档似乎不是很明确/广泛。 Any help here would be great.这里的任何帮助都会很棒。 Thanks!谢谢!

I've never used BFG before.我以前从未使用过 BFG。 It sounds useful if you're in this situation of having large files that you need to remove.如果您处于需要删除大文件的情况,这听起来很有用。 However, I'll try to explain the overall process, as I understand it.但是,我将尝试解释整个过程,据我所知。

Before we begin , note that BFG will rewrite the history of the the remote repository, and pushing it will require everyone on your team to re-clone the repository and transfer their local-only branches over.在我们开始之前,请注意 BFG 将重写远程存储库的历史记录,并且推送它将需要您团队中的每个人重新克隆存储库并将其仅本地分支转移过来。

According to git's documentation, git clone --mirror根据git的文档, git clone --mirror

Set up a mirror of the source repository.设置源存储库的镜像。 This implies --bare.这意味着 --bare。 Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository.与 --bare 相比,--mirror 不仅将源的本地分支映射到目标的本地分支,它还映射所有 refs(包括远程跟踪分支、注释等)并设置 refspec 配置,以便所有这些 refs被目标存储库中的 git 远程更新覆盖。

This means that the clone will create an exact copy of the remote repository on your machine.这意味着克隆将在您的计算机上创建远程存储库的精确副本。 As the BFG docs say, you should create a backup of this clone in case you need it later.正如 BFG 文档所说,您应该创建此克隆的备份,以备日后需要。

java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git

Will target the clone you made with git clone --mirror and will clean all commits of files containing > 100M except the most recent commit (as mentioned in the BFG docs).将针对您使用git clone --mirror制作的克隆,并将清除包含 > 100M 的文件的所有提交,除了最近的提交(如 BFG 文档中所述)。 BFG won't delete the old data automatically. BFG 不会自动删除旧数据。 It will stop, let you confirm everything looks good and then leave you to clean up the rest.它会停止,让您确认一切正常,然后让您清理 rest。

cd /Users/me/myrepomirror.git 

Will navigate to the bare repository.将导航到裸存储库。 You may have to change the path accordingly.您可能必须相应地更改路径。

git reflog expire --expire=now --all && git gc --prune=now --aggressive

Let's break this command up into it's two logical parts:让我们将此命令分解为两个逻辑部分:

  1. git reflog expire --expire=now --all
    • The expire subcommand will prune older reflog entries. expire 子命令将修剪旧的 reflog 条目。 The reflog is a log of the refs the HEAD has pointed to. reflog 是 HEAD 指向的 refs 的日志。 --expire=now tells git to expire all reflogs prior to the current time. --expire=now告诉 git 在当前时间之前使所有 reflogs 过期。
    • --all means across all references. --all表示所有引用。 Without --all, the expiration would only happen for the branch you're currently on, rather than all branches.如果没有 --all,过期只会发生在您当前所在的分支上,而不是所有分支上。
  2. git gc --prune=now --aggressive
    • git gc handles garbage collection for git. git gc 处理 git 的垃圾收集。 Normally, it'll run in the background on its own, but it is useful to be able to run it sometimes.通常,它会在后台自行运行,但有时能够运行它很有用。
    • --prune=now tells git gc to remove loose objects prior to the current time. --prune=now告诉 git gc 在当前时间之前删除松散的对象。
    • --aggressive will cause git gc to spend more time cleaning the repository of unnecessary files and provide greater optimization. --aggressive将导致 git gc 花费更多时间清理不必要文件的存储库并提供更大的优化。 The git gc docs have some additional info on it. git gc文档有一些额外的信息。

Once all of that is done, git push will overwrite the remote version of all of the branches with the newly cleaned ones.完成所有这些后, git push将用新清理的分支覆盖所有分支的远程版本。

You would now have to re-clone the repository in a different directory with git clone to obtain a non-bare version.您现在必须使用git clone在不同的目录中重新克隆存储库以获得非裸版本。

Essentially what we've done with this process is create a copy of the remote repository, removed the offending files and rewritten the commit history in the process, pushed the rewritten remote and overwritten what was there previously, and cloned a new copy of that repository for us to continue working.本质上,我们在这个过程中所做的是创建远程存储库的副本,删除有问题的文件并重写该过程中的提交历史,推送重写的远程并覆盖之前的内容,并克隆该存储库的新副本让我们继续工作。

Preventative measures预防措施

I'd suggest some preventative measures to avoid having to constantly remove these files.我建议采取一些预防措施,以避免不得不不断删除这些文件。 BFG shouldn't be run frequently, since it rewrites the repository's history. BFG不应该经常运行,因为它会重写存储库的历史记录。

Unfortunately, .gitignore doesn't support ignoring files larger than a given size.不幸的是,.gitignore 不支持忽略大于给定大小的文件。 However, there may be some options available to you, regardless.但是,无论如何,您可能有一些可用的选项。

  1. If all of these large files have a particular file extension or are in a specific directory, simply add them to the.gitignore file to prevent git from tracking them.如果所有这些大文件都具有特定的文件扩展名或位于特定目录中,只需将它们添加到 .gitignore 文件中即可防止 git 跟踪它们。
  2. Create a pre-commit hook which will prevent files above a certain size from being added.创建一个预提交挂钩,以防止添加超过一定大小的文件。 There seems to be a script (I haven't tested it) in response to this SO post .似乎有一个脚本(我没有测试过)来响应这个 SO post
    • This is a client-side githook, meaning it will need to be distributed to other developers on your team.这是一个客户端 gitook,这意味着它需要分发给您团队中的其他开发人员。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM