[英]Fast way to delete pattern lines from a file in shell
I have a file1 with approx 60000 lines, And a file2 with approx 20000 lines.我有一个大约有 60000 行的文件 1,还有一个大约有 20000 行的文件 2。 I need to delete the lines present in file2 from file1.
我需要从文件 1 中删除文件 2 中存在的行。 File2 also contains .* to delete the similar pattern from file1.
File2 还包含 .* 以从 file1 中删除类似的模式。
file1:文件 1:
ABC DEG
bhdh jdjjd
cdhhd jdjd
ABC hjj
file2:文件2:
ABC.*
cdhhd jdjd
Output should be:输出应该是:
bhdh jdjjd
Right now, I am using the below code.现在,我正在使用以下代码。
while read -r line
do
sed -i "/${line}/d" $file1
done < "$file2"
With this code, it's taking around 30 mins to get the output.使用此代码,大约需要 30 分钟才能获得输出。 I really need a better way to delete those lines from file1.
我真的需要一种更好的方法来从 file1 中删除这些行。
This is exactly for your task:这正是您的任务:
grep -vf file2 file1
-v
will exclude lines of file1 that match any pattern in file2 -v
将排除与 file2 中任何模式匹配的 file1 行
Note: Your loop is very slow because you read the patterns file line by line with a bash loop and you execute thousands of sed
commands, one for every pattern.注意:您的循环非常慢,因为您使用 bash 循环逐行读取模式文件并执行数千个
sed
命令,每个模式一个。 See also here some more on why this is a bad practice.另请参阅此处了解为什么这是一种不好的做法。
Note: To replace file1 with the output of the above command:注意:要将 file1 替换为上述命令的输出:
grep -vf file2 file1 > file1.tmp && mv file1.tmp file1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.