从 shell 中的文件中删除模式行的快速方法

Question

I have a file1 with approx 60000 lines, And a file2 with approx 20000 lines.我有一个大约有 60000 行的文件 1，还有一个大约有 20000 行的文件 2。 I need to delete the lines present in file2 from file1.我需要从文件 1 中删除文件 2 中存在的行。 File2 also contains .* to delete the similar pattern from file1. File2 还包含 .* 以从 file1 中删除类似的模式。

file1:文件 1：

ABC DEG
bhdh jdjjd
cdhhd jdjd
ABC hjj

file2:文件2：

ABC.*
cdhhd jdjd

Output should be:输出应该是：

bhdh jdjjd

Right now, I am using the below code.现在，我正在使用以下代码。

while read -r line
do
  sed -i "/${line}/d" $file1
done < "$file2"

With this code, it's taking around 30 mins to get the output.使用此代码，大约需要 30 分钟才能获得输出。 I really need a better way to delete those lines from file1.我真的需要一种更好的方法来从 file1 中删除这些行。

Answer 1

This is exactly for your task:这正是您的任务：

grep -vf file2 file1

-v will exclude lines of file1 that match any pattern in file2 -v将排除与 file2 中任何模式匹配的 file1 行

Note: Your loop is very slow because you read the patterns file line by line with a bash loop and you execute thousands of sed commands, one for every pattern.注意：您的循环非常慢，因为您使用 bash 循环逐行读取模式文件并执行数千个sed命令，每个模式一个。 See also here some more on why this is a bad practice.另请参阅此处了解为什么这是一种不好的做法。

Note: To replace file1 with the output of the above command:注意：要将 file1 替换为上述命令的输出：

grep -vf file2 file1 > file1.tmp && mv file1.tmp file1

从 shell 中的文件中删除模式行的快速方法

问题描述

1 个解决方案

解决方案1
2 2020-09-24 08:45:11

从 shell 中的文件中删除模式行的快速方法

问题描述

1 个解决方案

解决方案1 2 2020-09-24 08:45:11

解决方案1
2 2020-09-24 08:45:11