简体   繁体   English

我如何从git存储库中完全删除数据?

[英]How can i completely remove data from a git repository?

in my project, i had by mistake added some big image files to my repo. 在我的项目中,我错误地将一些大图像文件添加到了我的仓库中。 i read up on GitHub how to remove files from the history, and it did work: you cannot see the files in the history anymore. 我在GitHub上阅读了如何从历史记录中删除文件,它确实起作用:您再也看不到历史记录中的文件。 BUT then i made a tar.gz from my project for backup, and it is now twice the size it used to have! 但是然后我从我的项目中制作了一个tar.gz进行备份,现在它的大小是以前的两倍 i haven't added anything else that could justify this increase, so my suspicion is that the repo data that used to represent the image files was not really thrown out of the repo. 我没有添加任何其他可以证明这种增加的理由,因此我怀疑是用来表示图像文件的回购数据并没有真正从回购中剔除。 can someone corroborate that? 有人可以证实这一点吗? is there a fix? 有解决办法吗?

edit to clarify i know pretty little about git so i took exactly the steps as indicated on the GitHub help pages , with the single exception that i had to use a force switch from the second file onwards, as in git filter-branch -f --index-filter ... . 编辑以澄清我对git的了解很少,因此我完全按照GitHub帮助页面上指示的步骤进行操作,唯一的例外是我必须从第二个文件开始使用force开关,如git filter-branch -f --index-filter ...

to partially answer my own question, i think i could create a second git repo without the unwanted materials by 为了部分回答我自己的问题,我认为我可以创建第二个git repo,而无需使用不需要的材料

  • creating an empty repo in a different location 在其他位置创建一个空的仓库
  • reproducing the file situation at different steps of my project, leaving out unwanted ones 在项目的不同步骤中重现文件情况,而忽略了不必要的情况
  • and finally use that new repo instead of the old to push materials to GitHub. 并最终使用该新仓库而不是旧仓库将材料推送到GitHub。

has that been done before? 以前做过吗? specifically, can i use that new git repo instead of the old one with the same project on GitHub? 具体来说,我可以在GitHub上使用相同项目的新git repo代替旧​​的git repo吗?

btw, for what it's worth, this is about a presentation i am writing right now; 顺便说一句,这到底值多少钱,这是关于我现在正在撰写的演示文稿 there is an image of the tower of Babel in it that existed in several versions in high resolution, which explains the size of the problem (~100MB of unwanted data). 里面有一个Babel塔的图像,它以高分辨率存在于多个版本中,这说明了问题的大小(大约100MB的有害数据)。

edit 2 thx a lot for suggestions; 多编辑2 thx以获取建议; i did 我做到了

rm -rf .git/refs/original/
git reflog expire expire=now --all
git reflog expire --all
git gc --aggressive --prune=now

with the effect that the *.tar.gz size got smaller by a mere 0.5%... *.tar.gz大小变小了仅0.5%...

edit 3 it is daunting to experience the sheer complexity that is git. 编辑3体验git的复杂性是艰巨的。 i'm giving up at this point. 我在这一点上放弃了。 i did a test with a small throw-away repo; 我用一个小的一次性仓库进行了测试; i did an initial commit, added a big file, did a commit, removed the file and tried to erase its traces from memory with 我做了一个初始提交,添加了一个大文件,做了一个提交,删除了文件,并试图从内存中删除其痕迹

rm very-big-file.xcf
git filter-branch --index-filter 'git rm --cached --ignore-unmatch very-big-file.xcf' --prune-empty -- --all
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune=now

these are the recorded *.tar.gz sizes: 这些是记录的*.tar.gz大小:

foo.tar.gz          7,518 
foo2.tar.gz    65,735,003 
foo3.tar.gz    32,777,155 

the big file's compressed size is 32,955,246 bytes, which makes it entirely plausible that it is still fully present under .git , maybe even in uncompressed form. 大文件的压缩大小为32,955,246字节,这完全有可能使它完全存在于.git ,甚至可能是未压缩的形式。

GIT YU SO STUBBORN?? GIT YU SO STUBBORN?

isn't there any git purge extension to do this? 没有任何git purge扩展程序可以做到这一点吗? i mean, git filter-branch --index-filter 'git rm --cached --ignore-unmatch very-big-file.xcf' --prune-empty -- --all is not exactly what i could type from memory when i have a slight hangover. 我的意思是, git filter-branch --index-filter 'git rm --cached --ignore-unmatch very-big-file.xcf' --prune-empty -- --all都不是我可以从内存中键入的内容当我有一点宿醉时。

A quick way is to get the history to look exactly like you want, add the repo as the remote of a new empty one and then just fetch. 一种快速的方法是使历史记录看起来完全像您想要的,将存储库添加为新的空存储库的远程目录,然后进行获取。 You will only get the references and objects in the history they represent. 您只会在它们表示的历史记录中获得引用和对象。

You can now push this to a new GitHub repo. 您现在可以将其推送到新的GitHub存储库。

Re "edit 3"... here's a complete sequence, which I actually logged and retried to eliminate typos this time. 重新“编辑3” ...这是一个完整的序列,我实际上记录了该序列,然后再次尝试消除错别字。 :-) Note that you can't filter-branch after removing the big file unless you commit that remove (which is kind of pointless for this example). :-)请注意,除非删除了大文件,否则删除大文件后不能进行filter-branch (在本例中,这毫无意义)。 Check the du -s output. 检查du -s输出。

$ git init bigoop
Initialized empty Git repository in /tmp/bigoop/.git/
$ cd bigoop
$ echo tiny file with not much in it > tiny
$ git add tiny
$ git commit -m 'initial commit'
[master (root-commit) bd07e5a] initial commit
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 tiny
$ cp /path/to/huge/file hugefile
$ git add hugefile
$ git commit -m 'oops, add huge file'
[master 25cd764] oops, add giant file
 1 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hugefile
$ du -s .git
618992  .git
$ rm hugefile
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch hugefile' --prune-empty -- --all
Cannot rewrite branch(es) with a dirty working directory.
$ git checkout hugefile
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch hugefile' --prune-empty -- --all
Rewrite 25cd7647f49173fa8f42c0ca0a2ab8baf1842fca (2/2)rm 'hugefile'

Ref 'refs/heads/master' was rewritten
$ du -s .git
619012  .git
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 0 (delta 0)
$ du -s .git
140     .git

As for "GIT YU SO STUBBORN??" 至于“ GIT YU SO STUBBORN ??” ... it really works hard not to lose stuff. ...不丢失东西真的很努力。 Even when you're trying to make it lose stuff. 即使当您试图使其丢失时,也是如此。 :-) :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM