简体繁体 English

如何处理大量嵌套的CVS项目

[英]How to deal with a large number of nested CVS projects

原文 2016-12-23 12:15:46 2 2 git/ cvs2svn/ cvs2git

Never done this before, so I'm probably just being a noob... I'm trying to migrate our aged CVS repository to GitLab and I'm not sure how to handle the nested CVS projects. 以前从未做过此事，所以我可能只是一个菜鸟而已...我正在尝试将陈旧的CVS存储库迁移到GitLab，但不确定如何处理嵌套的CVS项目。 We have a LOT of them (ie about 1600 .project files dotted through the CVS repo). 我们有很多（即通过CVS存储库点缀的大约1600个.project文件）。 There's about 10 years worth of commits, totalling about 21GB, over two CVS repository directories. 在两个CVS存储库目录中，大约有10年的提交时间，总计约21GB。

The geneneral structure is $client/$product but most of these contain a bunch of subprojects - often very many. 总体结构是$ client / $ product，但其中大多数包含一堆子项目-通常很多。

What I've tried so far: 到目前为止，我已经尝试过：

Monolithic: tried to import the smaller CVS repo - ran out of memory on pass 1 first time (solved by adding memory) and ran out of disk space on pass 5 second time (can't really add disk as vmware datastores are nearly full - don't ask!). 整体式：尝试导入较小的CVS存储库-第一次通过时内存不足（通过添加内存来解决），并且在第五次通过时磁盘空间不足（由于vmware数据存储空间几乎已满，因此无法真正添加磁盘-不要问！）。
By client: cvs2git completed on one client, and then ran git --fast-import, but I then noticed all the sub-projects. 通过客户端：cvs2git在一个客户端上完成，然后运行git --fast-import，但随后我注意到所有子项目。 Git doesn't care about the merged history, but our coders will. Git不在乎合并的历史，但是我们的编码人员会。 Read up on git submodules, but not sure this is what I need, as the entire project is normally within the same CVS repo, and I see it complicates the process of cloning the project. 阅读git子模块，但不确定我是否需要它，因为整个项目通常都在同一CVS存储库中，并且我看到它使克隆项目的过程变得复杂。
By project within client: using the productions from (2), recursed the CVS repo depth-first with find, looking for .project files; 在客户端中按项目进行：使用（2）中的结果，使用find递归CVS repo深度优先，查找.project文件； created a subdirectory for each and did a git init --bare on each, before importing the sub-projects with git --fast-import. 为每个子目录创建一个子目录，并对每个子目录执行git init --bare，然后再使用git --fast-import导入子项目。 This took ages, as I believe it has to munge the entire cvs2git blob and dump files every time, and I'm not sure I ended up with a proper git hierarchy. 这花了很长时间，因为我相信每次都必须修改整个cvs2git blob并转储文件，而且我不确定我是否最终获得了正确的git层次结构。

So... rather than floundering round any more, I thought I'd ask here as I'm sure someone else must have needed to do this kind of thing. 所以...我想不再在这里徘徊，我想在这里问，因为我确信其他人一定需要做这种事情。 Any pointers greatly appreciated. 任何指针，不胜感激。

[edit]: Thanks for all the suggestions and help, people. [edit]：谢谢大家的建议和帮助。 It's out of my hands now - they (the devs) have decided to migrate the CVS projects piecemeal as they work them, so the majority will probably never be moved. 现在这已经不合我意-他们（开发人员）已经决定在工作时逐步迁移CVS项目，因此大多数人可能永远都不会动摇。 The old cvs will be kept round as a read-only reference, for that purpose, and projects will be checked-in to git "pristine" so for any "BG" (before git) history, they will refer to cvs, but for "AG" history, they will consult git. 为此，旧的cvs将保留为只读引用，并且项目将签入git“ pristine”，因此对于任何“ BG”（在git之前）历史记录，它们都将引用cvs，但对于“ AG”的历史，他们将咨询git。

As for the issue of the deeply nested projects, the explanation I was given is that it relates to Java class hierarchies, and each project equates to one class. 关于深度嵌套项目的问题，我得到的解释是它与Java类层次结构有关，并且每个项目都等同于一个类。 There's something in their build process that automatically changes cvs projects into java .jar files or something like that. 他们的构建过程中有些东西会自动将cvs项目更改为java .jar文件或类似的东西。 There's a LOT of java in there. 那里有很多Java。

2 个解决方案

I'm not quite sure what you're asking, but here are some comments, hopefully one or more of which will answer your question. 我不太确定您要问什么，但是这里有一些评论，希望其中一个或多个可以回答您的问题。

Did you want to separately convert each individual project separately to git? 您是否要将每个单独的项目分别转换为git？ I can't really tell from your question. 我真的不能从你的问题中分辨出来。 But if you do, you can just copy each project's directory tree and run cvs2git on it. 但是，如果这样做，您可以仅复制每个项目的目录树并在其上运行cvs2git。 (Or even perhaps just create symlinks to save space, so long as the nesting allows it.) Loop over them one at a time. （或者，甚至可以创建符号链接来节省空间，只要嵌套允许。）一次循环遍历它们。 The simplicity of CVS's server-side back-end file storage is a blessing in this case. 在这种情况下，CVS服务器端后端文件存储的简单性是幸运的。

eg doing this. 例如这样做。 Note that you could do some sort of a recursive copy rather than a symlink. 请注意，您可以执行某种递归副本，而不是符号链接。

/opt/cvsrepos/CVSROOT
             /path/to/project1
                     /project2

/opt/convertrepos/CVSROOT #dummy empty directory to fool cvs2git
                 /project1 -> /opt/cvsrepos/path/to/project1

Can you just copy the whole cvs repository somewhere else temporarily to do the conversion, where you have more disk space and memory? 您是否可以将整个cvs信息库临时复制到其他地方以进行转换，从而获得更多的磁盘空间和内存？
Whether you want to create one monolithic repository or lots of separate repositories is a whole opinion-based thing that is beyond the purpose of stackoverflow. 您是要创建一个整体存储库还是要创建许多单独的存储库，都是基于意见的，这超出了stackoverflow的目的。 It is also not clear to me if these projects require each other or not. 我也不清楚这些项目是否相互需求。 If not, then you have more flexibility in that choice. 如果没有，那么您在该选择中将具有更大的灵活性。

Usually it is not possible to preserve all information which is contained in centralized repository, especially something so imperfect as CVS, while converting to git. 通常，在转换为git时，不可能保留集中式存储库中包含的所有信息，尤其是像CVS这样不完善的信息。 So I think you should not try it at all. 因此，我认为您根本不应该尝试。 Preserve the original repository for historical reference, and convert to git only projects which are currently in development. 保留原始存储库以供历史参考，并将仅当前正在开发的项目转换为git。 You don't even have to import whole 10 years of their, 2-3 years would be enough. 您甚至不必全部导入10年，而2-3年就足够了。