简体   繁体   中英

Delete the history of an entire repository directory or delete commits with a given message in Git

I have a Git repository with a pretty long history. One of the directories in the repository is tracked, but consists of generated content. The size of the repository is becoming a problem and it is due to the changes in the generated directory, which are derivable from the other contents of the repository (it is only tracked due to certain tooling constraints). That means that the history of this one directory in particular is not very important, but for the rest of the repository, it is.

As I'm looking at ways to reduce the size of the repository without losing helpful history, I've identified two ways: either delete the history only for the files in this directory, in effect deleting history of this generated directory, or delete all commits which have a certain commit message, because in this case, the directory is only ever changed by commits with a certain commit message. Unfortunately, a better filter, like the contributor name or email, cannot be used as the automation which generates the directory impersonates one of the contributors to the repo.

Which of these two approaches are doable in Git? And if both, which might be better? Are there any approaches I am missing? I only have limited experience with amending Git repo history, usually to fix commit messages or wipe the evidence of existence of certain files, like secrets and keys. I want to inform myself before I unleash such a large scale change on a repository.

I don't think I need to add this, but just in case: the repo is hosted on GitHub and I assume I can just force-push to GitHub after carrying out the changes to make sure the history on GitHub gets updated the same way. I don't expect there are changes I could make which would work locally but not be transferrable to specifically GitHub as a remote, but if there are, I'd like to learn about them.

BFG Repo Cleaner

  1. Download BFG Repo Cleaner
  2. bfg --delete-folders path/to/your/content my-repo.git

Git filter-branch

git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch path/to/your/content" \
  --prune-empty --tag-name-filter cat -- --all

You'll need to do a force push once you're happy to reset any origins.

git works by snapshotting the whole directory tree watched by git.

This mean that you cannot just manipulate history by directory but you need to do it by changing every commit in the whole repository and having everyone to check out the repository a new as all the commit sha1 hashes change.

We did this a few years back as part of moving files to fit a Maven structure and wanting the history to stay on the files, but it was a non-trivial task.

You may want to reach a suitable checkpoint, and then simply start a new repository with your current files, and leave the old repository for reference for those needed who need it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM