简体   繁体   中英

Remove old commit information from a git repository to save space

I have a repository for storing some large binary files (tifs, jpgs, pdfs) that is growing pretty large. There is also a fair amount of files that are created, removed, and renamed and I don't care about the individual commit history. This question is somewhat simplified because I'm dealing with a repository that has no branches and no tags.

I'm curious if there's an easy way to remove some of the history from the system to save space.

I found an old thread on the git mailing list but it doesn't really specify how to use this (ie what the $drop is):

git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
        --tag-name-filter cat -- \
        --all ^$drop 

I think, you can shrink your history following this answer:

How to delete a specific revision of a github gist?

Decide on which points in history, you want to keep.

pick <hash1> <commit message>
pick <hash2> <commit message>
pick <hash3> <commit message>   <- keep
pick <hash4> <commit message>
pick <hash5> <commit message>
pick <hash6> <commit message>   <- keep
pick <hash7> <commit message>
pick <hash8> <commit message>
pick <hash9> <commit message>
pick <hash10> <commit message>  <- keep

Then, leave the first after each "keep" as "pick" and mark the others as "squash".

pick   <hash1> <commit message>
squash <hash2> <commit message>
squash <hash3> <commit message>   <- keep
pick   <hash4> <commit message>
squash <hash5> <commit message>
squash <hash6> <commit message>   <- keep
pick   <hash7> <commit message>
squash <hash8> <commit message>
squash <hash9> <commit message>
squash <hash10> <commit message>  <- keep

Then, run the rebase by saving and quitting the editor. At each "keep" point, the message editor will pop up for a combined commit message ranging from the previous "pick" up to the "keep" commit. You can then either just keep the last message or in fact combine those to document the original history without keeping all intermediate states.

After that rebase, the intermediate file data will still be in the repository but now unreferenced. git gc will now indeed get you rid of that data.

You could always just delete .git and do a fresh git init with one initial commit. This will, of course, remove all commit history.

$drop is a variable (that you want to looking for)

If you want to clean up unnecessary files and optimize the local repository you must check the command git gc

And git prune is another option because it removes objects that are no longer pointed to by any object in any reachable branch.

I hope this could help you.

If you want to find and remove large files from your Git history, Pro Git has a section called Removing Objects , which guides you through this process. It's a bit complicated, but it would allow you to remove files from your history that you have deleted anyway, while keeping the rest of your history intact.

It is a bit complicated to have git forget about a file.

git rm will only remove the file on this branch from now on, but it remains in history and git will remember it.

The right way to do it is with git filter-branch , as others have mentioned here. It will rewrite every commit in the history of the branch to delete that file.

But, even after doing that, git can remember it because there can be references to it in reflog, remotes, tags and such.

I wrote a little utility called git forget-blob

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

It is easy, just do git forget-blob file1.txt .

This will remove every reference, do git filter-branch , and finally run the git garbage collector git gc to completely get rid of this file in your repo.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM