As a recent question hinted, I'm looking for a way to speed up operations on a Git repository with a very large number of files (~6 million). I'd rather not use submodules. The problem is that operations are pretty slow. Is it possible to have one large repository but instruct Git to focus on only a portion of the repository? I thought that maybe creating a sparse-checkout would do it but the read-tree operation seems to delete files not specified in the sparse-checkout file and takes a really long time. Is it possible to do a read-tree keeping all the files where they are and is proportional only to the number of files specified in the sparse-checkout file?
Not currently, no. Git only recently (1.7+) added any sparse checkout support at all, and it's still fairly bare bones - mostly because Git wasn't really designed to handle only working with part of a repository.
It was more designed to be a one-repository-per-project version control system. Submodules were the method chosen to handle "projects" that had many large subcomponents.
First, I would suggest learning and using Submodules.
You can script what you like with
git ls-tree sha1
git show sha1:path/to/some/file.txt
and other low level commands. Also see bash commands such as
xargs
grep
cut
and piping.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.