简体   繁体   中英

How can I fetch a single file at at a specific commit (by hash) from a remote git repository?

I'd like to fetch a file at a commit from a remote git repository without fetching all objects in the repository. I know git archive doesn't work as it can only fetch the tip of a branch.

With sparse-checkout and using protocol v2 (thanks @bk2204) I can create a work-tree with only the readme at a commit, but git transmits 10s of thousands of objects and 188mb.

mkdir linux
cd linux
git init
git config core.sparseCheckout true
git config protocol.version 2
git remote add origin git@github.com:torvalds/linux.git
echo "/README" > .git/info/sparse-checkout
git fetch --depth 1 origin ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7
git checkout ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7
remote: Enumerating objects: 71432, done.
remote: Counting objects: 100% (71432/71432), done.
remote: Compressing objects: 100% (66651/66651), done.
remote: Total 71432 (delta 5277), reused 25451 (delta 3920), pack-reused 0
Receiving objects: 100% (71432/71432), 188.85 MiB | 7.71 MiB/s, done.
Resolving deltas: 100% (5277/5277), done.

Ideally this operation should fetch 3 objects - the commit (the known sha) > the commit's tree > the file in the tree

$ git cat-file -p ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7 | grep tree
tree f6760b0bf32bd3b9a760d6e895c7fb76cd9c2ef8
$ git cat-file -p f6760b0bf32bd3b9a760d6e895c7fb76cd9c2ef8 | grep README
100644 blob 669ac7c32292798644b21dbb5a0dc657125f444d    README
$ git cat-file -p 669ac7c32292798644b21dbb5a0dc657125f444d

In general, you cannot fetch individual objects without partial clone support. The protocol doesn't allow it. Sparse checkout doesn't prevent you from fetching all of the data, it just prevents you from checking it all out.

I'm not aware of any major Git hosting providers that have generally available partial clone support right now, although I suspect it will be coming soon. The feature is still relatively experimental.

However, if you're using a remote that supports protocol v2, you can fetch a specific commit, even if you normally wouldn't be able to without protocol v2. You can run git config protocol.version 2 and then you'll be able to fetch individual commits by hash. Doing that with a --depth 1 would be the best you could do in this particular case.

A hosting service is supposed to have APIs to download a file of a revision or retrieve its content. For example, Gitlab has GET /projects/:id/repository/files/:file_path/raw , and Github has GET /repos/:owner/:repo/contents/:path , and Gerrit has GET /projects/{project-name}/commits/{commit-id}/files/{file-id}/content .

For a simple self-hosting repository, to fetch a random commit or object by git fetch , you could set uploadpack.allowReachableSHA1InWant=true or uploadpack.allowAnySHA1InWant=true . In most cases, they are false(by default) for safety and performance. For a self-hosting Gerrit, it has similar configuration options. I have no idea about a self-hosting Gitlab.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM