简体   繁体   English

完全备份 git 存储库?

[英]Fully backup a git repo?

Is there a simple way to backup an entire git repo including all branches and tags?有没有一种简单的方法来备份整个 git 存储库,包括所有分支和标签?

git bundle

I like that method, as it results in only one file, easier to copy around.我喜欢这种方法,因为它只生成一个文件,更容易复制。
See ProGit: little bundle of joy .请参阅ProGit:快乐的小捆绑
See also " How can I email someone a git repository? ", where the command另请参阅“ 如何通过电子邮件向某人发送 git 存储库? ”,其中的命令

git bundle create /tmp/foo-all --all

is detailed:详细说明:

git bundle will only package references that are shown by git show-ref : this includes heads, tags, and remote heads. git bundle只会打包git show-ref 显示的引用:这包括头、标签和远程头。
It is very important that the basis used be held by the destination.目的地持有所使用的基础非常重要。
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.谨慎起见是可以的,这会导致捆绑文件包含目标中已有的对象,因为在目标解压缩时这些对象会被忽略。


For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):要使用该包,您可以克隆它,指定一个不存在的文件夹(在任何 git repo 之外):

git clone /tmp/foo-all newFolder

Whats about just make a clone of it?只是克隆它怎么样?

git clone --mirror other/repo.git

Every repository is a backup of its remote.每个存储库都是其远程的备份。

Expanding on the great answers by KingCrunch and VonC扩展KingCrunchVonC的精彩答案

I combined them both:我将它们结合起来:

git clone --mirror git@some.origin/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all

After that you have a file called reponame.bundle that can be easily copied around.之后,您有一个名为reponame.bundle的文件,可以轻松复制。 You can then create a new normal git repository from that using git clone reponame.bundle reponame .然后,您可以使用git clone reponame.bundle reponame创建一个新的普通 git 存储库。

Note that git bundle only copies commits that lead to some reference (branch or tag) in the repository.请注意, git bundle仅复制导致存储库中某些引用(分支或标签)的提交。 So tangling commits are not stored to the bundle.所以纠结的提交不会存储到包中。

Expanding on some other answers, this is what I do:扩展一些其他答案,这就是我所做的:

Setup the repo: git clone --mirror user@server:/url-to-repo.git设置git clone --mirror user@server:/url-to-repo.gitgit clone --mirror user@server:/url-to-repo.git

Then when you want to refresh the backup: git remote update from the clone location.然后当你想刷新备份时: git remote update从克隆位置。

This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).这会备份所有分支和标签,包括稍后添加的新分支和标签,但值得注意的是,被删除的分支不会从克隆中删除(这对于备份可能是一件好事)。

This is atomic so doesn't have the problems that a simple copy would.这是原子的,因此没有简单副本会出现的问题。

See http://www.garron.me/en/bits/backup-git-bare-repo.htmlhttp://www.garron.me/en/bits/backup-git-bare-repo.html

This thread was very helpful to get some insights how backups of git repos could be done.该线程对于了解如何完成 git repos 的备份非常有帮助。 I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself.我认为它仍然缺乏一些提示,信息或结论来为自己找到“正确的方法”(tm)。 Therefore sharing my thoughts here to help others and put them up for discussions to enhance them.因此,在这里分享我的想法以帮助其他人,并将它们提交讨论以增强它们。 Thanks.谢谢。

So starting with picking-up the original question:所以从拿起原来的问题开始:

  • Goal is to get as close as possible to a "full" backup of a git repository.目标是尽可能接近 git 存储库的“完整”备份。

Then enriching it with the typical wishes and specifiying some presettings:然后用典型的愿望丰富它并指定一些预设:

  • Backup via a "hot-copy" is preferred to avoid service downtime.首选通过“热复制”进行备份以避免服务停机。
  • Shortcomings of git will be worked around by additional commands. git 的缺点将通过额外的命令来解决。
  • A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).脚本应该执行备份以组合单个备份的多个步骤并避免人为错误(错别字等)。
  • Additionally a script should do the restore to adapt the dump to the target machine, eg even the configuration of the original machine may have changed since the backup.此外,脚本应该进行恢复以使转储适应目标机器,例如,甚至自备份以来原始机器的配置可能已经改变。
  • Environment is a git server on a Linux machine with a file system that supports hardlinks.环境是 Linux 机器上的 git 服务器,具有支持硬链接的文件系统。

1. What is a "full" git repo backup? 1. 什么是“完整”git repo 备份?

The point of view differs on what a "100%" backup is.关于什么是“100%”备份的观点不同。 Here are two typical ones.这里有两个典型的。

#1 Developer's point of view #1 开发者的观点

  • Content内容
  • References参考

git is a developer tool and supports this point of view via git clone --mirror and git bundle --all . git 是一个开发者工具,通过git clone --mirrorgit bundle --all支持这个观点。

#2 Admin's point of view #2 管理员的观点

  • Content files内容文件
    • Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see git gc )特殊情况“packfile”:git 在垃圾收集期间将对象合并并压缩到 packfile 中(请参阅git gc
  • git configuration git配置
  • Optional: OS configuration (file system permissions, etc.)可选:操作系统配置(文件系统权限等)

git is a developer tool and leaves this to the admin. git 是一个开发人员工具,它留给管理员。 Backup of the git configuration and OS configuration should be seen as separated from the backup of the content. git 配置和操作系统配置的备份应该与内容的备份分开。

2. Techniques 2. 技巧

  • "Cold-Copy" “冷拷贝”
    • Stop the service to have exclusive access to its files.停止服务以独占访问其文件。 Downtime!停机时间!
  • "Hot-Copy" “热拷贝”
    • Service provides a fixed state for backup purposes.服务为备份目的提供固定状态。 On-going changes do not affect that state.正在进行的更改不会影响该状态。

3. Other topics to think about 3. 其他需要考虑的话题

Most of them are generic for backups.它们中的大多数是通用的备份。

  • Is there enough space to hold the full backups?是否有足够的空间来保存完整备份? How many generations will be stored?将存储多少代?
  • Is an incremental approach wanted?是否需要增量方法? How many generations will be stored and when to create a full backup again?将存储多少代以及何时再次创建完整备份?
  • How to verify that a backup is not corrupted after creation or over time?如何验证备份在创建后或随着时间的推移没有损坏?
  • Does the file system support hardlinks?文件系统是否支持硬链接?
  • Put backup into a single archive file or use directory structure?将备份放入单个存档文件或使用目录结构?

4. What git provides to backup content 4. git 提供什么来备份内容

  • git gc --auto

    • docs: man git-gc文档:man git-gc
    • Cleans up and compacts a repository.清理并压缩存储库。
  • git bundle --all

    • docs: man git-bundle, man git-rev-list文档:man git-bundle,man git-rev-list
    • Atomic = "Hot-Copy"原子 =“热拷贝”
    • Bundles are dump files and can be directly used with git (verify, clone, etc.).捆绑包是转储文件,可以直接与 git 一起使用(验证、克隆等)。
    • Supports incremental extraction.支持增量提取。
    • Verifiable via git bundle verify .可通过git bundle verify
  • git clone --mirror

    • docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare文档:man git-clone,man git-fsck, git clone --mirror 和 git clone --bare 有什么区别
    • Atomic = "Hot-Copy"原子 =“热拷贝”
    • Mirrors are real git repositories.镜像是真正的 git 存储库。
    • Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.此命令的主要目的是构建一个完整的活动镜像,定期从原始存储库中获取更新。
    • Supports hardlinks for mirrors on same file system to avoid wasting space.支持同一文件系统上镜像的硬链接,以避免浪费空间。
    • Verifiable via git fsck .可通过git fsck
    • Mirrors can be used as a basis for a full file backup script.镜像可以用作完整文件备份脚本的基础。

5. Cold-Copy 5. 冷拷贝

A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.冷拷贝备份始终可以进行完整文件备份:拒绝对 git 存储库的所有访问,进行备份并再次允许访问。

  • Possible Issues可能的问题
    • May not be easy - or even possible - to deny all accesses, eg shared access via file system.拒绝所有访问(例如通过文件系统的共享访问)可能并不容易——甚至不可能。
    • Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(即使 repo 位于只有一个用户的客户端机器上,用户仍然可能在自动备份运行期间提交一些内容:(
    • Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.服务器上的停机时间可能是不可接受的,并且备份多个大型存储库可能需要很长时间。
  • Ideas for Mitigation:缓解的想法:
    • Prevent direct repo access via file system in general, even if clients are on the same machine.通常防止通过文件系统直接 repo 访问,即使客户端在同一台机器上。
    • For SSH/HTTP access use git authorization managers (eg gitolite) to dynamically manage access or modify authentication files in a scripted way.对于 SSH/HTTP 访问,使用 git 授权管理器(例如 gitolite)以脚本方式动态管理访问或修改身份验证文件。
    • Backup repos one-by-one to reduce downtime for each repo.逐个备份存储库以减少每个存储库的停机时间。 Deny one repo, do backup and allow access again, then continue with the next repo.拒绝一个 repo,做备份并再次允许访问,然后继续下一个 repo。
    • Have planned maintenance schedule to avoid upset of developers.有计划的维护计划,以避免开发人员不高兴。
    • Only backup when repository has changed.仅在存储库更改时备份。 Maybe very hard to implement, eg list of objects plus having packfiles in mind, checksums of config and hooks, etc.也许很难实现,例如对象列表加上考虑包文件,配置和钩子的校验和等。

6. Hot-Copy 6. 热拷贝

File backups cannot be done with active repos due to risk of corrupted data by on-going commits.由于正在进行的提交有损坏数据的风险,因此无法使用活动存储库完成文件备份。 A hot-copy provides a fixed state of an active repository for backup purposes.热拷贝提供用于备份目的的活动存储库的固定状态。 On-going commits do not affect that copy.正在进行的提交不会影响该副本。 As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.如上所述,git 的克隆和捆绑功能支持这一点,但对于“100% 管理员”备份,必须通过附加命令完成几件事。

"100% admin" hot-copy backup “100% 管理员”热拷贝备份

  • Option 1: use git bundle --all to create full/incremental dump files of content and copy/backup configuration files separately.选项 1:使用git bundle --all分别创建内容的完整/增量转储文件和复制/备份配置文件。
  • Option 2: use git clone --mirror , handle and copy configuration separately, then do full file backup of mirror.方案二:使用git clone --mirror ,单独处理和复制配置,然后做镜像的全文件备份。
    • Notes:笔记:
    • A mirror is a new repository, that is populated with the current git template on creation.镜像是一个新的存储库,它在创建时填充了当前的 git 模板。
    • Clean up configuration files and directories, then copy configuration files from original source repository.清理配置文件和目录,然后从原始源存储库复制配置文件。
    • Backup script may also apply OS configuration like file permissions on the mirror.备份脚本还可以应用操作系统配置,如镜像上的文件权限。
    • Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.使用支持硬链接的文件系统并在与源存储库相同的文件系统上创建镜像,以提高速度并减少备份期间的空间消耗。

7. Restore 7. 恢复

  • Check and adopt git configuration to target machine and latest "way of doing" philosophy.检查并采用 git 配置到目标机器和最新的“做事方式”哲学。
  • Check and adopt OS configuration to target machine and latest "way of doing" philosophy.检查并采用操作系统配置到目标机器和最新的“做事方式”理念。

use git bundle, or clone使用 git bundle 或 clone

copying the git directory is not a good solution because it is not atomic.复制 git 目录不是一个好的解决方案,因为它不是原子的。 If you have a large repository that takes a long time to copy and someone pushes to your repository, it will affect your back up.如果您有一个需要很长时间复制的大型存储库,并且有人推送到您的存储库,则会影响您的备份。 Cloning or making a bundle will not have this problem.克隆或制作捆绑包不会有这个问题。

The correct answer IMO is git clone --mirror . IMO 的正确答案是git clone --mirror This will fully backup your repo.这将完全备份您的回购。

Git clone mirror will clone the entire repository, notes, heads, refs, etc. and is typically used to copy an entire repository to a new git server. Git clone mirror 会克隆整个仓库、notes、heads、refs 等,通常用于将整个仓库复制到新的 git 服务器。 This will pull down an all branches and everything, the entire repository.这将拉下所有分支和所有内容,整个存储库。

git clone --mirror git@example.com/your-repo.git
  • Normally cloning a repo does not include all branches, only Master.通常克隆一个 repo 不包括所有分支,只包括 Master。

  • Copying the repo folder will only "copy" the branches that have been pulled in...so by default that is Master branch only or other branches you have checked-out previously.复制 repo 文件夹只会“复制”已被拉入的分支……因此默认情况下,只有主分支或您之前签出的其他分支。

  • The Git bundle command is also not what you want: "The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email to someone or put on a flash drive, then unbundle into another repository." Git bundle 命令也不是您想要的:“bundle 命令会将通常使用 git push 命令通过网络推送的所有内容打包成一个二进制文件,您可以通过电子邮件将其发送给某人或放在闪存驱动器上,然后解绑到另一个存储库。” (From What's the difference between git clone --mirror and git clone --bare ) (来自git clone --mirror 和 git clone --bare 之间的区别是什么

Everything is contained in the .git directory.一切都包含在.git目录中。 Just back that up along with your project as you would any file.就像备份任何文件一样,将它与您的项目一起备份。

You can backup the git repo with git-copy at minimum storage size.您可以使用git-copy以最小存储大小备份 git repo。

git copy /path/to/project /backup/project.repo.backup

Then you can restore your project with git clone然后你可以用git clone恢复你的项目

git clone /backup/project.repo.backup project
cd /path/to/backupdir/
git clone /path/to/repo
cd /path/to/repo
git remote add backup /path/to/backupdir
git push --set-upstream backup master

this creates a backup and makes the setup, so that you can do a git push to update your backup, what is probably what you want to do.这将创建一个备份并进行设置,以便您可以执行 git push 来更新您的备份,这可能是您想要做的。 Just make sure, that /path/to/backupdir and /path/to/repo are at least different hard drives, otherwise it doesn't make that much sense to do that.只要确保 /path/to/backupdir 和 /path/to/repo 至少是不同的硬盘驱动器,否则这样做没有多大意义。

Here are two options:这里有两个选项:

  1. You can directly take a tar of the git repo directory as it has the whole bare contents of the repo on server.您可以直接获取 git repo 目录的tar ,因为它在服务器上具有 repo 的全部裸内容。 There is a slight possibility that somebody may be working on repo while taking backup.有人在备份时可能正在处理 repo 的可能性很小。

  2. The following command will give you the bare clone of repo (just like it is in server), then you can take a tar of the location where you have cloned without any issue.以下命令将为您提供 repo 的裸克隆(就像它在服务器中一样),然后您可以毫无问题地获取克隆位置的 tar。

     git clone --bare {your backup local repo} {new location where you want to clone}

If it is on Github, Navigate to bitbucket and use "import repository" method to import your github repo as a private repo.如果它在 Github 上,导航到 bitbucket 并使用“导入存储库”方法将您的 github 存储库作为私有存储库导入。

If it is in bitbucket, Do the otherway around.如果它在 bitbucket 中,则相反。

It's a full backup but stays in the cloud which is my ideal method.这是一个完整的备份,但保留在云中,这是我的理想方法。

There is a very simple to use python tool that automatically backs up organisations' repositories in.zip format by saving public and private repositories and all their branches.有一个非常简单易用的 python 工具,通过保存公共和私有存储库及其所有分支,自动以.zip 格式备份组织的存储库。 It works with the Github API: https://github.com/BuyWithCrypto/OneGitBackup它适用于 Github API: https://github.com/BuyWithCrypto/OneGitBackup

据我所知,您只需复制您的回购所在的目录即可,就是这样!

cp -r project project-backup

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM