[英]GIT clone all repositories in parallel i.e. total time taken to clone all is close to what you'd take for the largest repo: fatal: index-pack failed
好。 蘋果系統。
alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '
echo $IG_API_URL
https://someinstance-git.mycompany.com/api/v3
請執行以下操作以查看:用戶有權訪問的所有組織的列表。 注意: 對於一個新用戶(在這里傳遞$ IG_API_URL將為您提供可以使用的所有REST端點) 。
gcurl ${IG_API/URL}/user/orgs
運行上面給了我一個很好的JSON對象輸出,我進入jq
並獲得了信息,最后我現在有了相應的git url,我可以用它來克隆一個repo。
我創建了一個主倉庫文件:
git@someinstance-git.mycompany.com:someorg1:some-repo1.git
git@someinstance-git.mycompany.com:someorg1:some-repo2.git
git@someinstance-git.mycompany.com:someorg2:some-repo1.git
git@someinstance-git.mycompany.com:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.
我創建了一個小的oneliner腳本(逐行讀取 - 我知道它是順序的但是)並運行git clone,它工作正常。
我討厭並試圖找到更好的解決方案是:
1)它按順序進行,速度很慢(即一個接一個)。
2)我想在最大時間內克隆所有repos ,它將需要最大的repo來克隆 。 即如果回購A需要3秒鍾,B需要20秒而C需要3秒而所有其他回購需要不到10秒,那么我想知道是否有辦法在20-30秒內快速克隆所有回購(相對於3 + 20 + 3 + ... + ... + ...秒>分鍾,這將是很多)。
為了做到這一點,我嘗試了我的思想貧困在后台運行git clone步驟,這樣我可以更快地迭代讀取這些行。
git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &
嘿,腳本迅速結束並運行ps -eAf|egrep "ssh|git"
顯示出一些有趣的運行。 巧妙地,其中一個人喊道:Incinga正在展示非常高的東西。 我以為這是由於我,但我想我可以做N不。 來自我的GIT實例的git克隆,不會影響任何網絡中斷/奇怪的東西。
好的,事情已成功運行了一段時間,我開始在屏幕上看到一堆git clone輸出。 在第二次會議上,我看到文件夾填充得很好,直到我終於看到了我不期望的內容:
Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
我懷疑你是在本地機器或遠程機器上耗盡資源,一次啟動~1000個進程。 您可能希望限制啟動的進程數。 一種技術是使用xargs
。
如果您可以訪問GNU xargs,它可能看起來像這樣:
xargs --replace -P10 git clone {} < repos.txt
-P10
是“10個過程” --replace
- 用映射的參數替換{}
如果你遇到像osx這樣殘缺的BSD xargs
(或者想要更高的兼容性),你可以使用更便攜的:
xargs -I{} -P10 git clone {} < repos.txt
此表單也適用於GNU xargs
謝謝安東尼。
為了並行執行GIT克隆(直到給定的-p為xargs),我嘗試了各種數字( -P5
, -P10
, -P15
,...,- -P100
,... -P<Limit_number_as_per_ulimit>
, - -P<No.of.processes_a_user_can_have_at_a_given_time>
)。 結論是堅持xargs -P5
或-P10
,因為-P<N>
數字越大,每次都不成功(由於我運行命令/腳本的機器上的資源問題())。
如果你增加-P(N值),你可能會看到如下錯誤:
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
or
fatal: The remote end hung up unexpectedly
or
fatal: early EOF
or
fatal: index-pack failed
or
sign_and_send_pubkey: signing failed: agent refused operation
or
ssh: connect to host somegit-instance.mycompany.com port 22: Operation timed out
fatal: Could not read from remote repository.
最后的腳本:
#!/bin/bash
# Variables
pattern=""; # Create git pattern to fetch enteries from master config based upon user's parameters, defaults to blank.
usage() {
echo -e "\nUsage:\n------\ngit-clone-repos.parallel.sh [usage | help | <pattern>]\n"
echo "git-clone-repos.parallel.sh \"github.mycompany.com\" .................................... (This will re-clone every repository under every org in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"github.mycompany.com:tools-ansible-some-org\" ................ (This will re-clone every repository under org: 'tools-ansible-some-org' in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"somegit-instance.mycompany.com:coolrepo-org/somerepo.git\" .... (This will re-clone repo: 'somerepo' in org: 'coolrepo-org' in Git instance: 'somegit-instance.mycompany.com')"
echo -e "\n\n"
}
# If help/usage as first arg, show usage help
if [[ ("$1" == "usage" || "$1" == "help") || $# -eq 0 ]]; then usage; exit 0; fi
# Set pattern
pattern="$1"
mc_file=~/AKS/common/master-config.git-repos-ssh-urls.txt
echo "-- Master config file: $mc_file"; echo
echo "-- Pattern passed for fetching repos from master config file is: \"$pattern\""
# Create a workspace dir in PWD so that everything sits fresh in a new folder. Tweak it if you don't want it.
dir="$$_$(date +%s)"
mkdir ${dir} && cd $dir
# First create a temp repo file filtered by pattern and for '@' lines only (i.e. ignoring commented out lines)
tmprepofile=$(mktemp)
grep "${pattern}" ${mc_file} | grep '@' | cut -d':' -f3- > ${tmprepofile}
# GIT clone in parallel mode (xargs -P5 is optimal, -P10 can be used).
# Git a repo as a different name so that all repos in any organization in any instance clones without any conflict.
xargs -I{} -P10 bash -c 'git clone {} $(echo {} | cut -d'@' -f2 | sed "s#\:#__#g;s#/#__#g;s#\.git##")' < ${tmprepofile}
使用的示例主配置文件是:
#-- Sample Master Config file, which can be generated using GIT rest api - against a user's org to find all user org repositories (in my case) looks like:
## github coolrepo-org org/repogroup contains:
##-----------
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo1.git
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo2.git
## somegit-instance pipeline org/repogroup contains:
##-----------
somegit-instance.mycompany.com:pipeline:git@somegit-instance.mycompany.com:pipeline/shinynew-cool-pipeline.git
## !!!!! NO ORG ACCESS REPO ENTRIES BELOW !!!!! ##
## -----------------------------------------------
## somegit-instance Misc no access org but access at just repo level enteries contains:
##----------- (appended to the master file at the end of master file generation script) ---------
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somerepofooter.git
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somereponav.git
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.