I know others have asked a similar question; but I am looking for gritty details here. I am very familiar with git, so I am not looking for why i would clone vs pull, or how to use different git workflows or anything. I am more interested in the underlying plumbing.
I have inherited a repo that is enormous (400MB). And have been tasked with cleaning it up.
I have used git filter-branch
and git gc --aggressive
to remove all references to any large files that don't need to be in the repo. These commands have worked swimmingly. However, there is some interesting behavior that I would like to understand.
git push --all --force <remote>
; I blew everything away and did a fresh git clone <remote>
; hoping that only the filtered update would get pulled down. Unfortunately, the size was still around 300MB. And listing out git ls-tree
still shows blobs with the large files that I had removed.git init
); then git remote add <url>
, then git pull <remote>
. Note: this is the same remote as above. The remote with the filter-branch stuff pushed up to it. This time, the pull only pulled down the small set of changes. The entire repo was much smaller. Around 37MB!! That's what I was looking for! So as you can see; git clone
is doing a lot more than just pulling down code changes on master and setting up remote tracking branches. What more is it doing? Why is it doing it? And how do I completely clean up my remote such that a clone will result in the smaller file size?
You are already on the right track there.
The difference is the type of data that is transferred in those two operations. Git pull only pulls the data (files, documents, etc.) that are stored in the central repository. Pull therefore requires an already existing and working connection between your remote and local repositories. Git clone, however, copies the whole repository. Not just data, but also metadata, the history (branching, commits, authors, etc.) and a bunch of configuration files that contain the file structure, rights, hierarchies, settings, data management information and more. So, this command basically builds up a whole new repository. Also, git clone creates a remote connection named "origin", which refers to the original repository using git refs...
If you want to start from scratch, I would suggest that you save your files/ code and then delete the remote repository. Then, create a new one, create the necessary branches, upload all the files and clone the repo. Otherwise, if you wish to keep the repo, make sure to delete unnecessary branches, execute some merges if possible and delete hidden backup and history files. The following link shows different ways of efficiently decreasing the size of a repository.
https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
I hope, I could help. Sven
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.