簡體   English   中英

GIT並行克隆所有存儲庫,即克隆all所需的總時間接近於您為最大的repo所采用的時間:致命:index-pack失敗

[英]GIT clone all repositories in parallel i.e. total time taken to clone all is close to what you'd take for the largest repo: fatal: index-pack failed

好。 蘋果系統。

alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '

echo $IG_API_URL 
https://someinstance-git.mycompany.com/api/v3

請執行以下操作以查看:用戶有權訪問的所有組織的列表。 注意: 對於一個新用戶(在這里傳遞$ IG_API_URL將為您提供可以使用的所有REST端點)

gcurl ${IG_API/URL}/user/orgs

運行上面給了我一個很好的JSON對象輸出,我進入jq並獲得了信息,最后我現在有了相應的git url,我可以用它來克隆一個repo。

我創建了一個主倉庫文件:

git@someinstance-git.mycompany.com:someorg1:some-repo1.git
git@someinstance-git.mycompany.com:someorg1:some-repo2.git
git@someinstance-git.mycompany.com:someorg2:some-repo1.git
git@someinstance-git.mycompany.com:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.

我創建了一個小的oneliner腳本(逐行讀取 - 我知道它是順序的但是)並運行git clone,它工作正常。

我討厭並試圖找到更好的解決方案是:
1)它按順序進行,速度很慢(即一個接一個)。

2)我想在最大時間內克隆所有repos ,它將需要最大的repo來克隆 即如果回購A需要3秒鍾,B需要20秒而C需要3秒而所有其他回購需要不到10秒,那么我想知道是否有辦法在20-30秒內快速克隆所有回購(相對於3 + 20 + 3 + ... + ... + ...秒>分鍾,這將是很多)。

為了做到這一點,我嘗試了我的思想貧困在后台運行git clone步驟,這樣我可以更快地迭代讀取這些行。

git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &

嘿,腳本迅速結束並運行ps -eAf|egrep "ssh|git"顯示出一些有趣的運行。 巧妙地,其中一個人喊道:Incinga正在展示非常高的東西。 我以為這是由於我,但我想我可以做N不。 來自我的GIT實例的git克隆,不會影響任何網絡中斷/奇怪的東西。

好的,事情已成功運行了一段時間,我開始在屏幕上看到一堆git clone輸出。 在第二次會議上,我看到文件夾填充得很好,直到我終於看到了我不期望的內容:

Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

我懷疑你是在本地機器或遠程機器上耗盡資源,一次啟動~1000個進程。 您可能希望限制啟動的進程數。 一種技術是使用xargs

如果您可以訪問GNU xargs,它可能看起來像這樣:

xargs --replace -P10 git clone {} < repos.txt
  • -P10是“10個過程”
  • --replace - 用映射的參數替換{}

如果你遇到像osx這樣殘缺的BSD xargs (或者想要更高的兼容性),你可以使用更便攜的:

xargs -I{} -P10 git clone {} < repos.txt

此表單也適用於GNU xargs

謝謝安東尼。

為了並行執行GIT克隆(直到給定的-p為xargs),我嘗試了各種數字( -P5-P10-P15 ,...,- -P100 ,... -P<Limit_number_as_per_ulimit> , - -P<No.of.processes_a_user_can_have_at_a_given_time> )。 結論是堅持xargs -P5-P10 ,因為-P<N>數字越大,每次都不成功(由於我運行命令/腳本的機器上的資源問題())。

如果你增加-P(N值),你可能會看到如下錯誤:

packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
or
fatal: The remote end hung up unexpectedly
or
fatal: early EOF
or
fatal: index-pack failed
or
sign_and_send_pubkey: signing failed: agent refused operation
or
ssh: connect to host somegit-instance.mycompany.com port 22: Operation timed out
fatal: Could not read from remote repository.

最后的腳本:

#!/bin/bash

# Variables
pattern=""; # Create git pattern to fetch enteries from master config based upon user's parameters, defaults to blank.

usage() {
 echo -e "\nUsage:\n------\ngit-clone-repos.parallel.sh [usage | help | <pattern>]\n"
 echo "git-clone-repos.parallel.sh \"github.mycompany.com\"             .................................... (This will re-clone every repository under every org in Git instance 'github.mycompany.com')"
 echo "git-clone-repos.parallel.sh \"github.mycompany.com:tools-ansible-some-org\"  ................ (This will re-clone every repository under org: 'tools-ansible-some-org' in Git instance 'github.mycompany.com')"
 echo "git-clone-repos.parallel.sh \"somegit-instance.mycompany.com:coolrepo-org/somerepo.git\"  .... (This will re-clone repo: 'somerepo' in org: 'coolrepo-org' in Git instance: 'somegit-instance.mycompany.com')"
 echo -e "\n\n"
}

# If help/usage as first arg, show usage help
if [[ ("$1" == "usage" || "$1" == "help") || $# -eq 0 ]]; then usage; exit 0; fi

# Set pattern
pattern="$1"
mc_file=~/AKS/common/master-config.git-repos-ssh-urls.txt
echo "-- Master config file: $mc_file"; echo
echo "-- Pattern passed for fetching repos from master config file is: \"$pattern\""

# Create a workspace dir in PWD so that everything sits fresh in a new folder. Tweak it if you don't want it.
dir="$$_$(date +%s)"
mkdir ${dir} && cd $dir

# First create a temp repo file filtered by pattern and for '@' lines only (i.e. ignoring commented out lines)
tmprepofile=$(mktemp)
grep "${pattern}" ${mc_file} | grep '@' | cut -d':' -f3- > ${tmprepofile}

# GIT clone in parallel mode (xargs -P5 is optimal, -P10 can be used).
# Git a repo as a different name so that all repos in any organization in any instance clones without any conflict.
xargs -I{} -P10 bash -c 'git clone {} $(echo {} | cut -d'@' -f2 | sed "s#\:#__#g;s#/#__#g;s#\.git##")' < ${tmprepofile}

使用的示例主配置文件是:

#-- Sample Master Config file, which can be generated using GIT rest api - against a user's org to find all user org repositories (in my case) looks like:
## github coolrepo-org org/repogroup contains:
##-----------
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo1.git
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo2.git

## somegit-instance pipeline org/repogroup contains:
##-----------
somegit-instance.mycompany.com:pipeline:git@somegit-instance.mycompany.com:pipeline/shinynew-cool-pipeline.git

## !!!!! NO ORG ACCESS REPO ENTRIES BELOW !!!!! ##
## -----------------------------------------------
## somegit-instance Misc no access org but access at just repo level enteries contains:
##----------- (appended to the master file at the end of master file generation script) ---------
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somerepofooter.git
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somereponav.git

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM