简体   繁体   中英

How do I clone a large Git repository on an unreliable connection?

I want to clone LibreOffice. From the official website, this is what's written:

All our source code is hosted in git:

Clone: $ git clone git://anongit.freedesktop.org/libreoffice/core # (browse)

Clone (http): $ git clone http://anongit.freedesktop.org/git/libreoffice/core.git # slower

Tarballs: http://download.documentfoundation.org/libreoffice/src/

please find the latest versions (usually near the bottom)

now, when I write this command in git bash to clone, it starts fetching. But the repository is so big that after hours I lose connectivity for a few seconds, it rolls back the download, and I get nothing.

Is there any way I can download the repository smoothly even if interruptions occur?

PS I am a new user of Git and I use a 1 MB DSL internet connection. The repository must be over 1 GB.

The repository is accessible via the http protocol (aka dumb protocol) here: http://anongit.freedesktop.org/git/libreoffice/core.git .

You can download everything here with wget or another download manager, and you'll have a clone of the repository. After that, you rename the directory from core.git to .git , and use the following command to tell git about the remote url:

$ git remote add remote http://anongit.freedesktop.org/git/libreoffice/core.git
$ git reset --hard HEAD

do 'git clone --depth 100' 它应该抓取最后 100 次提交

You can do the following:

git clone --depth 1 git@github.com:User/Project.git .
git fetch --unshallow

The first clone will still be atomic, so if your connection is not reliable enough to fetch the current HEAD then you will have trouble.

The subsequent fetch should be incremental and retryable if the connection drops half-way though.

The best method that I know of is to combine shallow clone ( --depth 1 ) feature with sparse checkout , that is checking out only the subfolders or files that you need. (Shallow cloning also implies --single-branch , which is also useful.) See udondan's answer for an example.

Additionally, I use a bash loop to keep retrying until finished successfully. Like this:

#!/bin/bash

git init <repo_dir>
cd <repo_dir>
git remote add origin <repo_url>

# Optional step: sparse checkout
git config core.sparsecheckout true                     # <-- enable sparse checkout
echo "subdirectory/*" >> .git/info/sparse-checkout      # <-- specify files you need

# Keep pulling until successful
until $( git pull --depth=1 origin master ); do         # <-- shallow clone
    echo "Pulling git repository failed; retrying..."
done

In this way I can eventually pull large repos even with slow VPN in China…

Importantly, by pulling this way you will still be able to push.

Increase buffer size so that git can utilize your bandwidth properly. Use following commands.

git config --global core.compression 0

git config --global http.postBuffer 1048576000

git config --global http.maxRequestBuffer 100M

git clone <repo url>

Wait till clone get complete.

I used a my web hosting server with shell access to clone it first and then used rsync to copy it locally. rsync would copy only remaining files when resumed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM