简体   繁体   English

在git-svn中处理带有二进制文件的大型存储库

[英]Handle large repository with binaries in git-svn

At my workplace there is a large svn repository (+80.000 revisions) with lots of binary files. 在我的工作场所,有一个包含大量二进制文件的大型svn存储库(+80.000版本)。 I am experimenting with git-svn over it, but it seems impractical to clone the whole history (it takes more than 100 GB and nearly a week to complete the process). 我正在试验git-svn,但克隆整个历史似乎是不切实际的(完成这个过程需要100多GB和近一周的时间)。

I have tried cloning a subset of revisions (last ~10.0000) and that works reasonably well. 我已经尝试克隆修订的一个子集(最后〜10.0000),并且工作得相当好。 The main drawback of this approach is that blames only go up to the oldest revision I fetched. 这种方法的主要缺点是责任只能达到我提取的最早的修订版。

Ideally, I would like to clone the whole history for source files and only the last thousand revisions for binaries. 理想情况下,我想克隆源文件的整个历史记录,并且只修复二进制文件的最后一千个修订版。 Is that somehow possible? 这有可能吗? Any other suggestions? 还有其他建议吗?

I've ran into the same issue at my workplace and so I'll share my solution. 我在我的工作场所遇到过同样的问题,所以我会分享我的解决方案。

The solution was not, unfortunately, to do what you're envisioning (though I did originally think of that too). 不幸的是,解决方案并不是为了做你想象的事情(虽然我最初也想到了这一点)。 The solution is the refactor the repository, separating binaries from sources. 解决方案是重构存储库,将二进制文件与源分开。 This is easier said than done, as you will need to get your department on board and it will impact your team's workflow, but if you can pull it off, it will be worth it. 这说起来容易做起来难,因为你需要让你的部门参与进来,这影响你团队的工作流程,但是如果你能把它拉下来,那将是值得的。

There are really three types of files to consider: 实际上有三种类型的文件需要考虑:

  • Sources should be isolated in a repository. 源应该在存储库中隔离。 That's simple enough to understand. 这很容易理解。
  • 3rd party binaries may also be committed to the repository, though importing them through svn:externals avoids lots of potential duplication. 第三方二进制文件也可以提交到存储库,但通过svn:externals导入它们可以避免大量潜在的重复。 These binaries aren't so bad because you won't have lots of history with them. 这些二进制文件并不是那么糟糕,因为你不会有很多历史记录。
  • Generated binaries (outputs of your compilation) are the worst by far! 到目前为止,生成的二进制文件(编译的输出)是最糟糕的! These change with every compilation and maintaining the history doesn't make sense. 这些随着每次编译而改变,并且维护历史是没有意义的。 VCS systems aren't intended for dealing with this. VCS系统不是为了解决这个问题。 Some companies love commiting binaries because they can check out the latest load without compiling it, but there is a huge cost. 一些公司喜欢提交二进制文件,因为他们可以在不编译的情况下检查最新的负载,但是会有巨大的成本。

The solution that I've been implementing is to make all binaries in a major product build and package from a single command. 我一直在实现的解决方案是通过单个命令在主要产品构建和包中创建所有二进制文件。 Then I will build, package, and archive nightly (or on-demand) builds from a build machine. 然后,我将从构建机器构建,打包和存档夜间(或按需)构建。 People can get the latest binaries from the build machine and as long as the packages are install-friendly, it's even easier than doing an svn up because you won't have so many updates/conflicts/merges. 人们可以从构建机器获取最新的二进制文件,只要这些软件包对安装友好,它比执行svn up更容易svn up因为你不会有这么多的更新/冲突/合并。 This brings generated binaries completely out of SVN. 这使生成的二进制文件完全脱离SVN。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM