简体   繁体   English

仅从git转换subversion存储库的一部分

[英]converting only parts of a subversion repository from git

I have an old Subversion repository with lots of my private projects. 我有一个旧的Subversion存储库,里面有很多私有项目。 Parts of it where converted from an old CVS repository some years ago (with cvs2svn or similar). 几年前从旧的CVS存储库转换的部分(使用cvs2svn或类似版本)。 Its current structure looks like this: 它目前的结构如下:

  • trunk 树干
    • latex 胶乳
    • java java的
      • awt-doku AWT-数独
      • pps PPS
        • build.xml build.xml文件
        • src SRC
          • ant 蚂蚁
          • de
            • dclj dclj
            • faq 常问问题
            • paul 保罗
              • (about 20 other packages) (约20个其他包)
              • ltxdoclet ltxdoclet
                • (some java files) (一些java文件)
    • lua LUA
    • (other directories) (其他目录)
  • branches 分支机构
  • tags 标签
  • import 进口
A problem is that I did quite some reorganization on this repository - for example, all the contents of the pps directory was once in a subdirectory of import (I think I imported it there from CVS), and there may have been other movements, too. 一个问题是我在这个存储库上做了很多重组 - 例如, pps目录的所有内容都曾经在import的子目录中(我想我是从CVS导入的),也可能有其他的动作。

I'm now interested in the contents of the ltxdoclet directory together with some other files along the path, like build.xml, the ant directory and so on. 我现在对ltxdoclet目录的内容以及路径中的一些其他文件感兴趣,比如build.xml, ant目录等等。 And I want to have their whole history, including any history before moving the files. 我想拥有他们的整个历史,包括移动文件之前的任何历史记录。 And I want it as a git repository now (since I want to publish this on github). 我现在想把它作为一个git存储库(因为我想在github上发布它)。 The tags and branches were never really used, so they are not important. 标签和分支从未真正使用过,因此它们并不重要。

I do not want the rest of this repository (they'll get separate git repositories sometimes) - this would blow up my repository too much (and there is some stuff I don't want to publish). 我不想要这个存储库的其余部分(它们有时会得到单独的git存储库) - 这会使我的存储库过多(并且有一些我不想发布的东西)。

Ideally, my resulting git repository (in the HEAD state) should look like this: 理想情况下,我生成的git存储库(处于HEAD状态)应该如下所示:

  • pps PPS
    • build.xml build.xml文件
    • src SRC
      • ant 蚂蚁
      • de
        • dclj dclj
          • paul 保罗
            • ltxdoclet ltxdoclet
              • (some java files) (一些java文件)
I don't really care about historical directory configurations, but the history should not contain any commits who did not touch any of the files in these directories (or their predecessors). 我并不真正关心历史目录配置,但历史记录不应包含任何未触及这些目录(或其前身)中任何文件的提交。


Of course, git svn seems to be the tool of choice. 当然, git svn似乎是首选工具。 (Are there others?) (还有其他人吗?)

git svn clone seems to be the right command ... but with which options? git svn clone似乎是正确的命令...但有哪些选项? I created an authors.txt to convert the CVS or SVN user names to my name and address. 我创建了一个authors.txt来将CVS或SVN用户名转换为我的姓名和地址。 To have only the interesting files and directories, I use --ignore-paths . 为了只有有趣的文件和目录,我使用--ignore-paths

This was my try: 这是我的尝试:

 filter='^/xcb-src/|_00|src/resources|dclj/faq|dclj/paul/([^l]|l[^t])' git svn clone svn+ssh://mathe-svn/ --trunk trunk/java/pps -A authors.txt --ignore-paths=$filter latexdoclet 

Of course, it shows only the history after commit 2306, when I moved import/java-pps to trunk/java/pps ... and it has lots of commits which have no changes at all. 当然,它只显示提交2306后的历史记录,当时我将import/java-ppstrunk/java/pps ...并且它有许多提交都没有任何更改。

To solve the first problem, I thought about giving also the old directory as --trunk : 为了解决第一个问题,我考虑过将旧目录作为--trunk

 git svn clone svn+ssh://mathe-svn/ --trunk trunk/java/pps --trunk import/java-pps -A authors.txt --ignore-paths=$filter latexdoclet 

This does not work, the first --trunk is ignored here, and it ends effectively on commit 2305 (before the move). 这不起作用,第一个--trunk在这里被忽略,并且它在提交2305(移动之前)上有效地结束。 (And it also contains lots of empty commits.) (它还包含许多空提交。)

My current try is to import the whole repository, filtering out anything not wanted: 我目前的尝试是导入整个存储库,过滤掉任何不需要的东西:

 filter='/xcb-src/|_00|src/resources|dclj/faq|dclj/paul/([^l]|l[^t])|/esperanto|finanzen|diverses|homepage|konfig|lua|prog-aufgaben|CVSROOT|latex|tags/' git svn clone svn+ssh://mathe-svn/ -A authors.txt --ignore-paths=$filter latexdoclet-neu 

The conversion is still running, but there certainly are lots of commits I don't want at all. 转换仍在运行,但肯定有很多我根本不想要的提交。

Edit: conversion completed - I now have 2658 commits (3176 objects in git), and only about 36 of them have some interesting tree change, if I configured my gitk filter right. 编辑:转换完成 - 我现在有2658个提交(git中有3176个对象),如果我正确配置了我的gitk过滤器,其中只有大约36个有一些有趣的树更改。 (+ about 3 more which were erroneously filtered out, since our latex source file was first in the latex directory.) (+大约3个被错误地过滤掉了,因为我们的乳胶源文件是第一个在latex目录中。)


  • Does anyone has better ideas on how to do this? 有没有人对如何做到这一点有更好的想法?
  • Should I better import the whole repository first and then use git filter-branch to pick out the files and commits I want? 我应该先导入整个存储库然后使用git filter-branch来挑选我想要的文件和提交吗?

Here what I did, for reference. 这是我做的,供参考。


After the answer from Dustin I first converted the whole svn repository to git, with 在达斯汀的回答之后,我首先将整个svn存储库转换为git,然后使用

 git svn clone -A authors.txt svn+ssh://mathe-svn/ all-projects

This got me a quite huge git repository of 24241 objects and 24 MBs (after packing), from a git repository of 45 MB. 这给我带来了一个非常庞大的git存储库,包含24241个对象和24 MB(打包后),来自45 MB的git存储库。 As already said a comment, both had 2658 commits in a linear history, so nothing was lost until now. 正如已经说过的那样,两者在线性历史上都有2658次提交,所以到目前为止还没有丢失。

Then I started to filter things out ... from the filters offered by git filter-branch , the --index-filter one seemed to be the most useful, since it does not need to checkout anything (compared to --tree-filter ), and I did not want to rewrite the metadata, only remove unwanted files. 然后我开始过滤掉了...从git filter-branch提供的过滤器, - --index-filter似乎是最有用的,因为它不需要检查任何东西(与--tree-filter相比) ),我不想重写元数据,只删除不需要的文件。

Additionally, --prune-empty would be useful, too. 另外, - --prune-empty也很有用。 I also used -d /dev/shm/ebermann/git-work/tmp to put the working directory in a tmpfs, but I don't know if this really mattered, since I did no checkouts here. 我还使用-d /dev/shm/ebermann/git-work/tmp将工作目录放在tmpfs中,但我不知道这是否真的很重要,因为我在这里没有检查。 I used the --original option to conserve the original master reference under a new name. 我使用--original选项以新名称保存原始master引用。 (Why doesn't filter-branch allow simply creating a new branch and let the old one intact?) (为什么filter-branch不允许简单地创建一个新分支并让旧分支完整?)

As my tree-filter, I used git rm --cached -r --ignore-unmatch , to which I fed a list of files and directories by xargs . 作为我的树形过滤器,我使用了git rm --cached -r --ignore-unmatch ,我通过xargs向其提供了一个文件和目录列表。

So, I had multiple calls of 所以,我有多次打电话

git filter-branch           \
  -d /dev/shm/ebermann/git-work/tmp  \
   --index-filter "
xargs -a ~/projektoj/git-conversion/remove-liste-5.txt git rm --cached -r --ignore-unmatch 
"        \
   --original "step8"       \
   master

and

git filter-branch \
  -d  /dev/shm/ebermann/git-work/tmp  \
  --prune-empty \
  --original "step9" \
  master

Between, I took a look at the created branch with gitk , looking for files I forgot before. 之间,我用gitk查看了创建的分支,查找我之前忘记的文件。 The first file list I created from the output of svn ls svn+ssh://mathe-svn/path , removing the files/directories I wanted to retain. 我从svn ls svn+ssh://mathe-svn/path的输出创建的第一个文件列表,删除了我想要保留的文件/目录。 I later had to repeat this for older revisions, since some files were renamed (or more exactly, whole directory trees were moved) before, so the old names did not show up. 我后来不得不重复这个以前的旧版本,因为之前一些文件被重命名(或更确切地说,整个目录树被移动),因此旧的名称没有出现。 Also, some files were removed before the current revision. 此外,在当前版本之前删除了一些文件。

Now I have my master branch reduced to 40 revisions, and my HEAD contains 39 files and directories. 现在我将我的master分支减少到40个修订,而我的HEAD包含39个文件和目录。

The repository (only this branch cloned in a new repository) now is only 180 KB big (with a working tree of 288 KB). 存储库(只有这个克隆在新存储库中的分支)现在只有180 KB大(工作树为288 KB)。 I'll now go and clean up the commit comments (which often have nothing at all to do with this project), and then publish it on github. 我现在将清理提交注释(通常与此项目没有任何关系),然后将其发布到github上。


For the next time, is there some command which creates a list of all file paths which have ever existed in my repository (without checking all revisions out and for each invoking find or such) ? 下一次, 是否有一些命令可以创建我的存储库中存在的所有文件路径的列表 (不检查所有修订版本以及每个调用find等) (Either for git or svn would be okay.) (对于git或svn都可以。)

Yes, learn filter-branch and do all the edits after the conversion. 是的,学习filter-branch并在转换后进行所有编辑。 You can do it incrementally and reverse each step if you get it wrong. 如果你弄错了,你可以逐步进行并反转每一步。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM