简体   繁体   中英

How to reduce git repo size on Bitbucket?

Summary of my problem: One of my private repositories on Bitbucket suddenly more than doubled in size after I pushed an addition of a few hundred bytes to two existing files. The repo is now over 2GB, which has caused Bitbucket to put it into read-only mode. Because it is in read-only mode, I cannot push changes that would reduce the repo size. (Catch 22.)

Details: My company recently began hosting git repositories on Bitbucket. One of the repositories I am in charge of had a size of about 973MB, which was uncomfortably close to the 1GB soft limit. To reduce the repo size, I followed the instructions in the Bitbucket documentation article Split a repository in two and moved about 450MB worth of documentation and online help files into their own private repo. I then followed the instructions in the Bitbucket documentation articles Reduce repository size and Maintaining a git repository , specifically:

git count-objects -vH showed me a size-pack of about 973MB.

I ran git filter-branch --index-filter 'git rm --cached --ignore-unmatch doc' HEAD to remove the doc directory (which is the content I'd moved to the new repo).

I ran the following commands to expire reference and prune:

git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --prune=now

git count-objects -vH then showed me a size-pack of 881.1 MiB and du -sh .git/objects returned 882M. I was disappointed that moving over 450MB reduce the repo size by less than 90MB, but pushed the changes to Bitbucket nevertheless:

git push --all --force
git push --tags --force

The settings page for the Bitbucket copy of the repo continued to show a size of 973MB. I logged out, refreshed the browser, logged back in, but that didn't help -- the repo size remained at 973MB.

This morning (three days after the changes described above) I made a couple of minor additions to two existing files which increased the files' sizes by a total of less than 1KB, added and commited them to my local repo, then pushed the change to Bitbucket. A few minutes later I took a look at the Bitbucket page for the repo and saw a red warning banner informing me "This repo is over the 2 GB limit and is in read-only mode." The settings page now says the repo has a size of 2.3 GB.

The push of a few hundred bytes added to two files was definitely the only activity to occur on the remote repo in the last three days, according to Bitbucket. That push may not have been the cause of the repo more than doubling in size, but the two events were closely correlated in time.

git reflog show returns nothing.

Cloning a new copy into an alternate directory, then running git count-objects give me a size-pack of 881.29 MiB.

The local repository is on a CentOS 6.5 system. git version is 1.8.5.3.

Questions

  1. Why did moving 450MB of files out of the repo only reduce the size of my local repo by 90MB?
  2. Why did even that modest reduction not get pushed to the remote repo on Bitbucket?
  3. How on Earth did the remote repo size jump from 973MB to 2.3GB?
  4. How do I fix it? I cannot push to the remote repo even with the --force flag. Any push gets me the error message "conq: repository is in read only mode (over 2 GB size limit). fatal: Could not read from remote repository."

I've found that the easiest way to reduce the Bitbucket repo size if you are over the 2GB limit is to

  1. Create a branch on Bitbucket
  2. Delete that branch on Bitbucket

This should trigger Bitbucket to run git gc on the repo.

After conferring with Bitbucket technical support, I can now answer some of my own questions:

  1. Why did moving 450MB of files out of the repo only reduce the size of my local repo by 90MB? Something in the history got missed. I don't what exactly, but the filter-branch command missed something. I was able to successfully reduce the repo size by 450MB by running the utility BFG Repo-Cleaner .
  2. Why did even that modest reduction not get pushed to the remote repo on Bitbucket? It did, but Bitbucket support must then run git gc on their side. One can contact Bitbucket request and ask them to run git gc on a repo.
  3. How on Earth did the remote repo size jump from 973MB to 2.3GB? Unknown. Bitbucket technical support didn't have the answer to this one either.
  4. How do I fix it? Contact Bitbucket support. They can put a repository back into read-write mode so that you can push a smaller repository and they can run git gc on their end.

First of all check the repository size in your local using the following command :-

git count-objects -Hv

We can use following commands

git reflog expire --expire="1 hour" --all
git reflog expire --expire-unreachable="1 hour" --all
git prune --expire="1 hour" -v
git gc --aggressive --prune="1 hour"

Now , again use the command git count-objects -Hv to notice the change in the size and garbage of repository

How on Earth did the remote repo size jump from 973MB to 2.3GB?

This is a known bug on bitbucket cloud side, see BCLOUD-19794 .

Garbage file is intermittently counted in the repository size.

When pushing to the remote repository a GC is triggered afterwards which generates a garbage file. This garbage file is cleared on the next subsequent GC. Between those two GC's the size of the repository is displayed incorrectly within Bitbucket UI as the garbage file size is intermittently counted towards the repository total size.

As noted in the workaround section, you need to contact bitbucket to manually run the GC.

Bitbucket might take action sooner rather than later if enough people go vote for it.

As I am sure those familar with got already know, but git stores your version history for files, so making changes and pushing files will not reduce your repo size.

There are still several ways to reduce repo sizes on bitbucket, GitHub, gitlab, etc. The best way is to delete branches, as that permanently deletes any files being recorded by that branch, as long as it is not being tracked by another. But you may want the latest files in that branch, so do the following:

  1. On local machine, create a duplicate repo. (Backup, so you don't lose info)
  2. Delete a branch that you want to move, or create a fresh version of. You can use --cached to delete remote branch.
  3. If you want to refresh branch, you can copy files into new branch and push.
  4. If you want to create new remote repo, you can do that too.

Depending on host, you may have to run special commands, but this should work in most cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM