如何删除Git存储库中不在工作目录中的所有文件？

Question

I'm in the process of splitting up an old suite of applications which originally resided in a single Subversion repository. 我正在拆分最初位于单个Subversion存储库中的旧应用程序套件。

I've converted it over to a Git repository and removed what I don't want, but I'd like to slim the repository down by getting rid of the historical data associated with the deleted files (the original repository will be maintained for reference purposes so it isn't needed in the new one). 我已将其转换为Git存储库并删除了我不想要的内容，但我想通过删除与已删除文件关联的历史数据来缩小存储库（将保留原始存储库以供参考目的所以新的不需要它。

Ideally what I'd like to do is go through the entire repository and remove any files or folders not present in the working directory, along with any history associated with them. 理想情况下，我想要做的是遍历整个存储库并删除工作目录中不存在的任何文件或文件夹，以及与之关联的任何历史记录。 This would leave me with the contents of HEAD and a history of commits affecting those files. 这将留给我HEAD的内容和影响这些文件的提交历史。 However, I haven't come across a way of doing this (orphaning HEAD doesn't help as it doesn't preserve the history). 但是，我没有遇到过这样做的方法（孤儿HEAD没有帮助，因为它没有保留历史记录）。

Is this possible? 这可能吗？ I know how to remove a single file or folder from the entire history via git-filter-branch, but there's too many files and folders for this to be a practical approach... unless there's a way of filtering on all files not in HEAD? 我知道如何通过git-filter-branch从整个历史记录中删除单个文件或文件夹，但是有太多的文件和文件夹，这是一个实用的方法......除非有一种方法可以过滤所有不在HEAD中的文件？

Answer 1

Here's how you can use git filter-branch to get rid of all files that you don't want: 以下是如何使用git filter-branch来删除所有不需要的文件：

Get a list of the filenames that you don't want to appear in the history both the old names and the new names in case of renames. 在重命名的情况下，获取您不希望在历史记录中显示的旧名称和新名称的文件名列表。 For example put them in a file called toberemoved.txt 例如，将它们放在名为toberemoved.txt的文件中

Run git filter-branch like this: 像这样运行git filter-branch：

 $ git filter-branch --tree-filter "rm -f `cat toberemoved.txt`" branch1 branch2 ...

Here's the relevant man page from git filter-branch: 这是git filter-branch的相关手册页：

   --tree-filter <command>
       This is the filter for rewriting the tree and its contents. The
       argument is evaluated in shell with the working directory set to
       the root of the checked out tree. The new tree is then used as-is
       (new files are auto-added, disappeared files are auto-removed -
       neither .gitignore files nor any other ignore rules HAVE ANY
       EFFECT!).

So just make sure that the list of files you want deleted are all relative to the root of the checked out tree. 因此，只需确保要删除的文件列表都相对于签出树的根目录。

Update: 更新：

To get the list of the files that were present in the past but not in the current working directory you can run the following. 要获取过去但不在当前工作目录中的文件列表，可以运行以下命令。 Note that you'll have to do further effort to keep the "history before renaming" of renamed files: 请注意，您必须进一步努力保留重命名文件的“重命名前的历史记录”：

$ git log --raw |awk '/^:/ { if (! printed[$6]) { print $6; printed[$6] = 1 }}'|while read f;do if [ ! -f $f ]; then echo Deleted: $f;fi;done

That $6 is the name of the file that were affected in a commit in shown in the --raw mode of log. $ 6是在-raw模式下显示的提交中受影响的文件的名称。

See the --diff-filter option to git log if you want know what happened ([D]eleted, [R]enamed, [M]odified, and so on) to each file for every commit. 如果你想知道每次提交每个文件发生了什么（[D] eleted，[R] enamed，[M] odified等），请参阅git log的--diff-filter选项。

Maybe others can chime in on how to find out the previous name of a tracked file in case of renames. 也许其他人可以在重命名时查看如何查找跟踪文件的先前名称。

Answer 2

Helping to the second answer: "Maybe others can chime in on how to find out the previous name of a tracked file in case of renames." 帮助第二个答案： “也许其他人可以在重命名时找到如何查找跟踪文件的先前名称。”

This will return the files in your project and the files from which they are renamed. 这将返回项目中的文件以及重命名它们的文件。

for file in `git ls-files`; do git log --follow --name-only --pretty=format: $file | sort -n -b | uniq | sed '/^\\s*$/d'; done

You can use them to exclude from the list. 您可以使用它们从列表中排除。

The whole solution is: 整个解决方案是：

for file in `git ls-files`; do git log --follow --name-only --pretty=format: $file | sort -n -b | uniq | sed '/^\\s*$/d'; done > current.txt

git log --raw |awk '/^:/ { if (! printed[$6]) { print $6; printed[$6] = 1 }}'|while read f;do if [ ! -f $f ]; then echo $f;fi;done | sort > hist.txt

diff --new-line-format="" --unchanged-line-format="" hist.txt current.txt > for_remove.txt

Answer 3

I did this a couple of times - extract commits for a single file and create new repository from them. 我这样做了几次 - 提取单个文件的提交并从中创建新的存储库。 It goes somewhat like this: 它有点像这样：

$ c=10; for commit in $(git log --format=%h -- path/to/file|tac); do
      c=$((c+1))
      git format-patch -1 --stdout $commit > $c.patch
  done

This creates the patch files 11.patch, 12.patch and so on. 这将创建补丁文件11.patch，12.patch等。 I then edit these patches (using vim or perl whichever seems best for the job), removing entire hunks for files that I'm not interested in, and maybe fix the names as well in case of renames in the diff hunk header. 然后我编辑这些补丁（使用vim或perl看起来最适合这项工作），删除我不感兴趣的文件的整个数据库，也可以修改名称以及在diff hunk标头中重命名的情况。

The I'd use git am on the patches on a new git repository. 我会在新git存储库的补丁上使用git am。 If something doesn't come up right then I nuke the new git repository and edit the patches again and repeat the git am. 如果某些事情没有出现，那么我会核对新的git存储库并再次编辑补丁并重复git am。

The reason I start counting from 10 is because I'm lazy to prepend a leading 0 to the patch sequence and for commits more than 99 I just start at 99. 我从10开始计算的原因是因为我懒得在补丁序列前面加一个前导0，而对于超过99的提交我只是从99开始。

如何删除Git存储库中不在工作目录中的所有文件？

问题描述

3 个解决方案

解决方案1
7 2011-09-13 09:38:13

解决方案2
3 2015-08-19 15:12:44

解决方案3
3 2011-09-07 17:12:42

如何删除Git存储库中不在工作目录中的所有文件？

问题描述

3 个解决方案

解决方案1 7 2011-09-13 09:38:13

解决方案2 3 2015-08-19 15:12:44

解决方案3 3 2011-09-07 17:12:42

解决方案1
7 2011-09-13 09:38:13

解决方案2
3 2015-08-19 15:12:44

解决方案3
3 2011-09-07 17:12:42