简体   繁体   English

使用 Git 和 Mercurial 进行部分克隆

[英]Partial clone with Git and Mercurial

Is it possible to clone only one branch (or from a given commit) in Git and Mercurial?是否可以在 Git 和 Mercurial 中仅克隆一个分支(或来自给定的提交)? I mean, I want to clone a central repo but since it's huge I'd like to only get part of it and still be able to contribute back my changes.我的意思是,我想克隆一个中央仓库,但由于它很大,我只想得到它的一部分,并且仍然能够回馈我的更改。 Is it possible?是否可以? Like, I only want from Tag 130 onwards or something like that?比如,我只想要 Tag 130 或类似的东西?

If so, how?如果是这样,如何?

In Git land you are talking about three different types of partial clones: 在Git土地中,您正在谈论三种不同类型的部分克隆:

  • shallow clones: I want history from revision point X onward. 浅克隆:我想要从修订点X开始的历史。

    Use git clone --depth <n> <url> for that, but please remember that shallow clones are somewhat limited in interacting with other repositories. git clone --depth <n> <url>使用git clone --depth <n> <url> ,但请记住,浅克隆在与其他存储库交互方面有些限制。 You would be able to generate patches and send them via email. 您将能够生成补丁并通过电子邮件发送。

  • partial clone by filepath: I want all revision history history in some directory /path . 通过filepath进行部分克隆:我希望在某个目录/path所有修订历史记录。

    Not possible in Git. 在Git中不可能 With modern Git though you can have sparse checkout , ie you have whole history but you check out (have in working area) only subset of all files. 使用现代Git,虽然你可以有稀疏结账 ,即你有完整的历史,但你检查(有​​工作区)只有所有文件的子集。

  • cloning only selected branch: I want to clone only one branch (or selected subset of branches). 仅克隆选定的分支:我想只克隆一个分支(或选定的分支子集)。

    Possible, and 可能的,和

    before git 1.7.10 not simple: you would need to do what clone does manually, ie git init [<directory>] , then git remote add origin <url> , edit .git/config replacing * in remote.origin.fetch by requested branch (probably 'master'), then git fetch . 在git 1.7.10之前并不简单:你需要手动完成克隆所做的事情,即git init [<directory>] ,然后git remote add origin <url> ,在remote.origin.fetch编辑.git/config remote.origin.fetch *请求分支(可能是'master'),然后是git fetch

    as of git 1.7.10 git clone offers the --single-branch option which seems like it was added just for this purpose, and seems pretty easy. 从git 1.7.10开始, git clone提供了--single-branch选项,看起来它只是为了这个目的而添加,看起来很简单。

    Note however that because branches usually share most of their history, the gain from cloning only a subset of branches might be smaller than you think. 但请注意,由于分支通常共享其大部分历史记录,因此克隆仅一部分分支的收益可能比您想象的要小。

You can also do a shallow clone of only selected subset of branches. 您还可以仅对选定的分支子集进行浅层克隆。

If you know how people will want to break things down by filepath (multiple projects in the same repository) you can use submodules (sort of like svn:externals) to pre-split the repo into separately cloneable portions. 如果您知道人们希望如何通过文件路径(同一存储库中的多个项目)来解决问题,则可以使用子模块(类似于svn:externals)将repo预拆分为单独的可复制部分。

In mercurial land you're talking about three different types of partial clones: 在mercurial土地,你谈论三种不同类型的部分克隆:

  • shallow clones: I want the history from revision point X onward use the remotefilelog extension 浅克隆:我希望从修订点X开始的历史记录使用remotefilelog扩展名
  • partial clones by filepath: I want all revision history in directory /path with experimental narrowhg extension or I want only files in directory /path to be in my working directory with experimental sparse extension (shipped since version 4.3, see hg help sparse ). 文件路径的部分克隆:我希望目录/路径中的所有修订历史记录具有实验性的narrowhg扩展名,或者我只希望目录/路径中的文件位于我的工作目录中,并具有实验稀疏扩展 (从4.3版本开始,请参阅hg help sparse )。
  • partial clones by branch: I want all revision history on branch Y: use clone -r 分支部分克隆:我希望分支Y上的所有修订历史记录: 使用clone -r

If you know how people will want to break things down by filepath (multiple projects in the same repo (shame on you)) you can use subrepositories (sort of like svn externals) to pre-split the repo into separately cloneable portions 如果您知道人们希望如何通过文件路径解决问题(同一个仓库中的多个项目(羞辱你)),您可以使用子存储库(类似于svn externals)将repo预拆分为可单独克隆的部分

Also, as to the "so huge I'd like to only get a part of it": You really only have to do that one time ever. 此外,至于“如此巨大,我只想得到它的一部分”:你真的只需要这样做一次。 Just clone it while you have lunch, and then you have it forever more. 只是在你吃午饭时克隆它,然后你永远拥有它。 Subsequently you can pull and get deltas efficiently going forward. 随后你可以有效地pull和获得增量。 And if you want another clone of it, just clone your first clone. 如果你想要它的另一个克隆,只需克隆你的第一个克隆。 Where you got a clone doesn't matter (and local clones take up no additional diskspace since they're hard links under the covers). 你得到克隆的地方并不重要(本地克隆不占用额外的磁盘空间,因为它们是封面下的硬链接)。

The selected answer provides a good overview, but lacks a complete example. 所选答案提供了一个很好的概述,但缺乏一个完整的例子。

Minimize your download and checkout footprint (a) , (b) : 最大限度地减少下载和结帐的足迹(a)(b)

git clone --no-checkout --depth 1 --single-branch --branch (name) (repo) (folder)
cd (folder)
git config core.sparseCheckout true
echo "target/path/1" >>.git/info/sparse-checkout
echo "target/path/2" >>.git/info/sparse-checkout
git checkout

Periodically optimize your local repository footprint (c) (optional, use with care): 定期优化本地存储库占用空间(c) (可选,小心使用):

git clean --dry-run # consider and tweak results then switch to --force
git gc
git repack -Ad
git prune

See also: How to handle big repositories with git 另请参见: 如何使用git处理大型存储库

This method creates an unversioned archive without subrepositories: 此方法创建没有子存储库的无版本存档:

hg clone -U ssh://machine//directory/path/to/repo/project projecttemp

cd projecttemp

hg archive -r tip ../project-no-subrepos

The unversioned source code without the subrepositoies is in the project-no-subrepos directory 没有子存储库的无版本源代码位于project-no-subrepos目录中

Regarding Git it might be of a historical significance that Linus Torvalds answered this question from the conceptual perspective back in 2007 in a talk that was recorded and is available online. 关于Git,Linus Torvalds可能具有历史意义,从2007年的概念角度回答了这个问题,在一个有记录并可在线获取的演讲中。

The question is whether it is possible to check out only some files out of a Git repository. 问题是是否有可能仅从Git存储库中检出一些文件。

Tech Talk: Linus Torvalds on git t=43:10 技术讲座:Linus Torvalds on git t = 43:10

To summarize, he said that one of the design decisions of Git that sets it apart from other source management systems (he cites BitKeeper and SVN) is that Git manages content, not files. 总而言之,他说Git的设计决策之一就是将其与其他源管理系统区分开来(他引用BitKeeper和SVN)是Git管理内容而不是文件。 The implications being that eg a diff of a subset of files in two revisions is computed by first taking the whole diff and then pruning it only to the files that were requested. 其含义是,例如,通过首先获取整个差异然后仅将其修剪到所请求的文件来计算两个修订中的文件子集的差异。 Another is that you have to check out the whole history; 另一个是你必须查看整个历史; in an all or nothing fashion. 以一种全有或全无的方式。 For this reason, he suggests splitting loosely related components among multiple repositories and mentions a then ongoing effort to implement an user interface for managing a repository that is structured as a super-project holding smaller repositories. 出于这个原因,他建议在多个存储库之间拆分松散相关的组件,并提到当时正在努力实现用于管理存储库的用户界面,该存储库被构造为包含较小存储库的超级项目。

As far as I know this fundamental design decision still apples today. 据我所知,这个基本的设计决定今天仍然是苹果。 The super-project thing probably became what now are submodules . 超级项目的东西可能成为现在的子模块

If, as in Brent Bradburn ' answer , you do a repack in a Git partial clone, make sure to:如果像Brent Bradburn回答一样,您在 Git 部分克隆中重新打包,请确保:

git clone --filter=blob:none --no-checkout https://github.com/me/myRepo
cd myRepo
git sparse-checkout init
# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/

# populate working-tree with only the right files:
git read-tree -mu HEAD

Regarding the local optimization in a partial clone, as in:关于部分克隆中的局部优化,如:

git clean --dry-run # consider and tweak results then switch to --force
git gc
git repack -Ad
git prune

use Git 2.32 (Q2 2021), where " git repack -A -d " ( man ) in a partial clone unnecessarily loosened objects in promisor pack before 2.32: fixed.使用 Git 2.32(2021 年第 2 季度),其中部分克隆中的“ git repack -A -dman在 2.32 之前不必要地松开承诺包中的对象:已修复。

See commit a643157 (21 Apr 2021) by Rafael Silva ( raffs ) .请参阅Rafael Silva ( raffs )提交 a643157 (2021 年 4 月 21 日)。
(Merged by Junio C Hamano -- gitster -- in commit a0f521b , 10 May 2021) (由Junio C Hamano 合并gitster 提交 a0f521b ,2021 年 5 月 10 日)

repack : avoid loosening promisor objects in partial clones repack :避免松散部分克隆中的承诺对象

Reported-by: SZEDER Gábor报告人:SZEDER Gábor
Helped-by: Jeff King帮助者:Jeff King
Helped-by: Jonathan Tan帮助:Jonathan Tan
Signed-off-by: Rafael Silva签字人:Rafael Silva

When git repack -A -d ( man ) is run in a partial clone, pack-objects is invoked twice: once to repack all promisor objects, and once to repack all non-promisor objects.git repack -A -d ( man )在部分克隆中运行时,将调用pack-objects两次:一次重新打包所有承诺对象,一次重新打包所有非承诺对象。
The latter pack-objects invocation is with --exclude-promisor-objects and --unpack-unreachable , which loosens all objects unused during this invocation.后一个pack-objects调用是使用--exclude-promisor-objects--unpack-unreachable ,它释放了在此调用期间未使用的所有对象。
Unfortunately, this includes promisor objects.不幸的是,这包括允诺对象。

Because the -d argument to git repack ( man ) subsequently deletes all loose objects also in packs, these just-loosened promisor objects will be immediately deleted.因为git repack ( man )-d参数随后删除了包中的所有松散对象,这些刚刚松散的承诺对象将被立即删除。
However, this extra disk churn is unnecessary in the first place.然而,这种额外的磁盘搅动首先是不必要的。
For example, in a newly-cloned partial repo that filters all blob objects (eg --filter=blob:none ), repack ends up unpacking all trees and commits into the filesystem because every object, in this particular case, is a promisor object.例如,在过滤所有 blob 对象(例如--filter=blob:none )的新克隆的部分 repo 中, repack最终会解压缩所有树并提交到文件系统中,因为在这种特殊情况下,每个 object 都是承诺者 object .
Depending on the repo size, this increases the disk usage considerably: In my copy of the linux.git, the object directory peaked 26GB of more disk usage.根据存储库大小,这会大大增加磁盘使用量:在我的 linux.git 副本中,object 目录的磁盘使用量达到 26GB 的峰值。

In order to avoid this extra disk churn, pass the names of the promisor packfiles as --keep-pack arguments to the second invocation of pack-objects .为了避免这种额外的磁盘变动,将承诺者包文件的名称作为--keep-pack arguments 传递给pack-objects的第二次调用。
This informs pack-objects that the promisor objects are already in a safe packfile and, therefore, do not need to be loosened.这会通知pack-objects promisor objects 已经在安全的 packfile 中,因此不需要松动。

For testing, we need to validate whether any object was loosened.为了测试,我们需要验证是否有任何 object 被松动。
However, the "evidence" (loosened objects) is deleted during the process which prevents us from inspecting the object directory.但是,“证据”(松动的对象)在此过程中被删除,这使我们无法检查 object 目录。
Instead, let's teach pack-objects to count loosened objects and emit via trace2 thus allowing inspecting the debug events after the process is finished.相反,让我们教pack-objects计算松散的对象并通过 trace2 发出,从而允许在过程完成后检查调试事件。
This new event is used on the added regression test.这个新事件用于添加的回归测试。

Lastly, add a new perf test to evaluate the performance impact made by this changes (tested on git.git ):最后,添加一个新的性能测试来评估此更改对性能的影响(在git.git上测试):

 Test HEAD^ HEAD ---------------------------------------------------------- 5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2%

For a bigger repository, such as linux.git, the improvement is even bigger:对于更大的存储库,例如 linux.git,改进更大:

 Test HEAD^ HEAD ------------------------------------------------------------------- 5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1%

These improvements are particular big because every object in the newly-cloned partial repository is a promisor object.这些改进特别大,因为新克隆的部分存储库中的每个 object 都是一个 promisor object。


As noted with Git 2.33 (Q3 2021), the git-repack ( man ) doc clearly states that it does operate on promisor packfiles (in a separate partition), with " -a " specified.如 Git 2.33(2021 年第 3 季度)所述, git-repack ( man )文档明确指出它确实对承诺包文件(在单独的分区中)进行操作,并指定了“ -a ”。

Presumably the statements here are outdated, as they feature from the first doc in 2017 (and the repack support was added in 2018)大概这里的陈述已经过时,因为它们来自 2017 年的第一个文档(并且在 2018 年添加了重新打包支持)

See commit ace6d8e (02 Jun 2021) by Tao Klerks ( TaoK ) .请参阅Tao Klerks ( TaoK )提交 ace6d8e (2021 年 6 月 2 日)。
(Merged by Junio C Hamano -- gitster -- in commit 4009809 , 08 Jul 2021) (由Junio C Hamano 合并gitster 提交 4009809,2021年 7 月 8 日)

Signed-off-by: Tao Klerks签字人:Tao Klerks
Reviewed-by: Taylor Blau审核人:Taylor Blau
Acked-by: Jonathan Tan确认人:Jonathan Tan

See technical/partial-clone man page .请参见technical/partial-clone 手册页

Plus, still with Git 2.33 (Q3 2021), " git read-tree " ( man ) had a codepath where blobs are fetched one-by-one from the promisor remote, which has been corrected to fetch in bulk .另外,仍然是 Git 2.33(2021 年第 3 季度),“ git read-tree( man )有一个代码路径,其中 blob 是从 promisor remote 一个接一个地获取的,它已被更正为批量获取

See commit d3da223 , commit b2896d2 (23 Jul 2021) by Jonathan Tan ( jhowtan ) .请参阅Jonathan Tan ( jhowtan )提交 d3da223提交 b2896d2 (2021 年 7 月 23 日)。
(Merged by Junio C Hamano -- gitster -- in commit 8230107 , 02 Aug 2021) (由Junio C Hamano 合并gitster 提交 8230107,2021年 8 月 2 日)

cache-tree : prefetch in partial clone read-tree cache-tree :在部分克隆读取树中预取

Signed-off-by: Jonathan Tan签字人:Jonathan Tan

" git read-tree " ( man ) checks the existence of the blobs referenced by the given tree, but does not bulk prefetch them. git read-tree( man )检查给定树引用的 blob 是否存在,但不批量预取它们。
Add a bulk prefetch.添加批量预取。

The lack of prefetch here was noticed at $DAYJOB during a merge involving some specific commits, but I couldn't find a minimal merge that didn't also trigger the prefetch in check_updates() in unpack-trees.c (and in all these cases, the lack of prefetch in cache-tree.c didn't matter because all the relevant blobs would have already been prefetched by then).在涉及某些特定提交的合并期间,在$DAYJOB注意到这里缺少预取,但我找不到一个最小合并也不会触发unpack-trees.c中的check_updates()中的预取(以及所有这些在某些情况下, cache-tree.c中缺少预取并不重要,因为到那时所有相关的 blob 都已经预取了)。
This is why I used read-tree here to exercise this code path.这就是为什么我在这里使用 read-tree 来练习这段代码路径。


Git 2.39 (Q4 2022) avoids calling ' cache_tree_update() ' when doing so would be redundant. Git 2.39(2022 年第 4 季度)避免调用“ cache_tree_update() ”,因为这样做是多余的。

See commit 652bd02 , commit dc5d40f , commit 0e47bca , commit 68fcd48 , commit 94fcf0e (10 Nov 2022) by Victoria Dye ( vdye ) .请参阅Victoria Dye ( vdye )提交 652bd02提交 dc5d40f提交 0e47bca提交 68fcd48提交 94fcf0e (2022 年 11 月 10 日)。
(Merged by Taylor Blau -- ttaylorr -- in commit a92fce4 , 18 Nov 2022) (由Taylor Blau 合并ttaylorr 提交 a92fce4,2022年 11 月 18 日)

read-tree : use ' skip_cache_tree_update ' option read-tree :使用“ skip_cache_tree_update ”选项

Signed-off-by: Victoria Dye签字人:Victoria Dye
Signed-off-by: Taylor Blau签字人:Taylor Blau

When running 'read-tree' with a single tree and no prefix, ' prime_cache_tree() ' is called after the tree is unpacked.当使用单个树且没有前缀运行“read-tree”时,在解包树后调用“ prime_cache_tree() ”。
In that situation, skip a redundant call to ' cache_tree_update() ' in ' unpack_trees() ' by enabling the ' skip_cache_tree_update ' unpack option.在这种情况下,通过启用“ skip_cache_tree_update ”解包选项,在“ unpack_trees() )”中跳过对“ cache_tree_update() ”的冗余调用。

Removing the redundant cache tree update provides a substantial performance improvement to ' git read-tree ' ( man ) <tree-ish> , as shown by a test added to 'p0006-read-tree-checkout.sh':删除冗余缓存树更新为“ git read-tree( man ) <tree-ish>提供了实质性的性能改进,如添加到“p0006-read-tree-checkout.sh”的测试所示:

 Test before after ---------------------------------------------------------------------- read-tree `br_ballast_plus_1` 3.94(1.80+1.57) 3.00(1.14+1.28) -23.9%

Note that the ' read-tree ' in ' t1022-read-tree-partial-clone.sh ' is updated to read two trees, rather than one.请注意,“ t1022-read-tree-partial-clone.sh ”中的“ read-tree ”已更新为读取两棵树,而不是一棵。
The test was first introduced in d3da223 (" cache-tree : prefetch in partial clone read-tree", 2021-07-23, Git v2.33.0-rc0 -- merge ) to exercise the ' cache_tree_update() ' code path, as used in ' git merge ' ( man ) .该测试首先在d3da223中引入(“ cache-tree : prefetch in partial clone read-tree”,2021-07-23, Git v2.33.0-rc0 -- merge )以执行' cache_tree_update() '代码路径,如用于' git merge ' ( man )
Since this patch drops the call to ' cache_tree_update() ' in single-tree ' git read-tree ', change the test to use the two-tree variant so that ' cache_tree_update() ' is called as intended.由于此补丁在单树“ git read-tree ”中删除了对“ cache_tree_update() ”的调用,因此更改测试以使用双树变体,以便按预期调用“ cache_tree_update() ”。

In mercurial, you should be able to so some of this using: 在mercurial中,您应该可以使用以下方法:

hg convert --banchmap FILE SOURCEDEST REVMAP

You may also want: 您可能还需要:

--config convert.hg.startrev=REV

The source can be git, mercurial, or a variety of other systems. 源可以是git,mercurial或其他各种系统。

I haven't tried it, but convert is quite rich. 我没有尝试过,但转换非常丰富。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM