简体   繁体   English

我可以展平git存储库中已删除的文件吗?

[英]Can I flatten out deleted files from a git repository?

The git version control system , is a kind of distributed log (with some conceptual similarities to the raft consensus protocol). git版本控制系统是一种分布式日志(在概念上与共识协议有些相似)。

Raft and some other systems have a concept of log compaction , so redundant changesets don't bulk down the overall log of changes. Raft和其他一些系统具有日志压缩的概念,因此冗余变更集不会减少变更的整体日志。

What I want is to 'bulk clean' deleted files - not isolate a single one for exclusion. 我想要的是“批量清除”已删除的文件-不能隔离一个要排除的文件。

My question is: Can I flatten out deleted files from a git repository? 我的问题是: 我可以展平git存储库中已删除的文件吗?

EDIT: 编辑:

  • suppose in my history - I have five separate scenarios of someone checking in five different 100M binary files at different points in time - and I'd rather not have to download that each time someone does a clone. 假设在我的历史中-我有五个单独的场景,有人在不同的时间点检入五个不同的100M二进制文件-并且我宁愿不必每次有人进行克隆时都下载该文件。 I'm looking for a 'bulk clean of deleted files from my repo' whilst still keeping my repo. 我正在寻找“批量清除我的仓库中已删除文件的信息”,同时仍然保留我的仓库。

"suppose in my history - I have five separate scenarios of someone checking in a 100M file - and I'd rather not have to download that each time someone does a clone." “假设在我的历史记录中-我有五个单独的场景,有人要检入100M文件-我宁愿不必每次有人进行克隆时都下载该文件。”

Git already does this. Git已经做到了。 As long as the file contents are the same, its hash will be the same. 只要文件内容相同,其散列也将相同。 Git uses hashes to identify files, and so the file will resolve to the same hash and will not result in increased space usage. Git使用散列来识别文件,因此文件将解析为相同的散列,并且不会导致空间使用增加。

If, on the other hand, the file contents are slightly different , then the space may or may not be saved, depending on various details of where they are in the git tree, and the options used when a git gc is performed. 另一方面,如果文件内容略有不同 ,则可能会或可能不会节省空间,具体取决于它们在git树中的位置以及执行git gc时所使用的选项的各种详细信息。 (Supposing the files are diffable. Binary files may or may not be. Look up git delta compression.) (假设文件是​​可扩散的。二进制文件可能会或可能不会。查找git delta压缩。)

Having said all that, git is in many ways does not work well with large binary files (I'm assuming that 100 MB files are binary, though they are perhaps not) and you may want to look at something like git large files or something else within git to support large files, or an scm other than git. 综上所述,git在许多方面都无法与大型二进制文件配合使用(我假设100 MB文件是二进制文件,尽管可能不是),并且您可能想看看git large files东西其他在git内以支持大文件,或git以外的scm。

Ok - here is the list of things to check: 好的-这是要检查的事项列表:

You can run: 您可以运行:

git gc

You can get information using: 您可以使用以下方法获取信息:

git count-objects -v

There is a script here for git-fatfiles . 这里有一个git-fatfiles脚本。

This is a script for recreating all the branches in a new repo. 这是用于在新仓库中重新创建所有分支的脚本。

Using this you can list the big objects and sort them: 使用此功能,您可以列出大对象并对其进行排序:

git verify-pack -v .git/objects/pack/pack-*.idx | sort -k3n

Using this you can find which commit had the blob that takes up the space. 使用此方法,您可以找到哪个提交具有占据空间的Blob。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM