简体   繁体   English

在cvs2git Migration上的删除历史记录早于x

[英]Drop history older than x on cvs2git Migration

we plan to migrate one of our last big CVS repositories in a Git repository. 我们计划在Git存储库中迁移我们最后一个大型CVS存储库之一。

For migration we are using svn2git's cvs2git. 对于迁移,我们使用svn2git的cvs2git。 Because this CVS repository has grown over ~ 12 years, it has 31GB of data. 由于此CVS存储库已增长了约12年,因此它具有31GB的数据。

I couldn't find any solution to drop all history older than a specified period of time (2 years for example). 我找不到任何解决方案来删除所有超过指定时间段(例如2年)的历史记录。

Do you know any tool/command/resolution for one of this?: 您是否知道其中任何一种工具/命令/分辨率?

  • Drop history from the CVS 从CVS删除历史记录
  • Don't export all history with cvs2git 不要使用cvs2git导出所有历史记录
  • Don't import all history with Git import 不要使用Git导入来导入所有历史记录
  • Drop history from the Git 从Git删除历史记录

Thanks and greetings, Andreas 谢谢,安德烈亚斯

Solution as suggested by Dmitry Oksenchuk: After editing grafts, I wrote a BASH script tp clean up messed up tags and branches: Dmitry Oksenchuk建议的解决方案:编辑嫁接之后,我编写了一个BASH脚本tp clean up弄乱了标签和分支:

#!/bin/bash

NEW_ROOT_REF=$1
git tag --contains $NEW_ROOT_REF | sort  > TAGS_TO_KEEP.tmp
echo "Keep Tags:"
cat TAGS_TO_KEEP.tmp | wc -w

git branch --contains $NEW_ROOT_REF | sort  > BRANCHES_TO_KEEP.tmp
echo "Keep Branches:"
cat BRANCHES_TO_KEEP.tmp | wc -w

git tag -l | sort > TAGS_ALL.tmp
echo "All Tags:"
cat TAGS_ALL.tmp | wc -w

git branch -l | sort > BRANCHES_ALL.tmp
echo "All Branchess:"
cat BRANCHES_ALL.tmp | wc -w

# Remove tags
COUNTER=0
for drop in `comm TAGS_ALL.tmp TAGS_TO_KEEP.tmp -23`; do
        git tag -d $drop
        COUNTER=$[$COUNTER +1]
done
echo "Dropped tags: $COUNTER"

# Remove branches
COUNTER=0
for drop in `comm BRANCHES_ALL.tmp BRANCHES_TO_KEEP.tmp -23`; do
        git branch -D $drop
        COUNTER=$[$COUNTER +1]
done
echo "Dropped branches: $COUNTER"

# Clean up
rm TAGS_ALL.tmp TAGS_TO_KEEP.tmp BRANCHES_ALL.tmp BRANCHES_TO_KEEP.tmp

In a well-formed Git repo depth of the history is usually not an issue. 在格式正确的Git回购中,历史深度通常不是问题。 In linux repo there are more than 500k commits and it works fine. Linux repo中,有超过500k的提交,并且工作正常。 This year we migrated a ~15 years old CVS repo (5GB of ,v files) to Git. 今年,我们将大约15年历史的CVS存储库(5GB的,v文件)迁移到了Git。 The Git repo takes ~200MB and contains ~70k commits. Git存储库占用约200MB内存,并包含约7万次提交。

We faced two major problems: binary files and the number of tags. 我们面临两个主要问题:二进制文件和标签数量。

Binary files 二进制文件

In CVS all the revisions of binary files are stored on the server and only the current revision is trasferred on checkout. 在CVS中,二进制文件的所有修订版都存储在服务器上,并且在签出时仅传输当前修订版。 So it's not a problem at all to store binary files in CVS, you just need enough disk space on the server. 因此,将二进制文件存储在CVS中根本不是问题,您只需要服务器上有足够的磁盘空间即可。 With Git the situation is different. 使用Git,情况就不同了。 When you make a clone of a Git repo, all the revisions of binary files are transferred to your local clone. 克隆Git存储库时,二进制文件的所有修订版都将转移到本地克隆中。 Even if a file doesn't exists in the most recent commit, all its historical revisions are in your local repo. 即使最近的提交中不存在文件,其所有历史修订也都位于您的本地存储库中。 We managed to shrink the size of Git repo from ~700MB to ~200MB by removing not necessary binary files from the history. 通过从历史记录中删除不必要的二进制文件,我们设法将Git存储库的大小从约700MB缩小到了约200MB。 The important point here is to base your decision on size of a file in Git, not in CVS. 这里的重点是您的决定基于Git中的文件大小,而不是CVS中。 Git packs objects using zlib compression and delta compression, so the history of the same file can take totally different disk space in Git and in CVS. Git使用zlib压缩和增量压缩来打包对象,因此同一文件的历史记录可能会占用Git和CVS中完全不同的磁盘空间。 You can use the "Find large files" plugin in Git Extensions. 您可以在Git扩展程序中使用“查找大文件”插件。

Number of tags 标签数量

We have more than 20k build tags in CVS repo. CVS回购中有超过2万个构建标签。 With such number of tags both Git Extensions and Source Tree work extremly slow (especially when they need to load all the tags into a drop-down list). 使用这样数量的标签,Git扩展和源代码树的工作速度都非常慢(尤其是当它们需要将所有标签加载到下拉列表中时)。 git push with Git 1.9.5 was also very slow because of performace regression fixed in Git 2.3.0. Git 1.9.5的git push也非常慢,因为Git 2.3.0中修复了性能回归。 Currently in Git we keep only build tags for recent 2 years (~7k tags) periodically archiving older tags. 目前在Git中,我们仅保留最近两年的构建标签(约7k个标签),并定期存档较旧的标签。

Dropping old history 放弃旧历史

If you still need it, it's much easier and safer to drop old history in Git than in CVS or during migration. 如果您仍然需要它,那么在Git中删除旧历史记录比在CVS中或迁移过程中更容易,更安全。

  1. Set new root commit in the grafts file: echo %commit_hash% >.git/info/grafts 设置新的根承诺在grafts文件: echo %commit_hash% >.git/info/grafts
  2. Remove all the tags and branches that do not contain that commit (see git tag --contains and git branch --contains ) 删除所有不包含该提交的标记和分支(请参阅git tag --containsgit branch --contains
  3. Rewrite the commit graph: git filter-branch --tag-name-filter cat -- --all 重写提交图: git filter-branch --tag-name-filter cat -- --all

Or, you can also parse the git-dump.dat file (output of cvs2git in git fast-import format) and remove old commits, tags, and branches from there. 或者,您也可以解析git-dump.dat文件(以git快速导入格式输出cvs2git)并从此处删除旧的提交,标记和分支。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM