Fast way to delete pattern lines from a file in shell

Question

I have a file1 with approx 60000 lines, And a file2 with approx 20000 lines. I need to delete the lines present in file2 from file1. File2 also contains .* to delete the similar pattern from file1.

file1:

ABC DEG
bhdh jdjjd
cdhhd jdjd
ABC hjj

file2:

ABC.*
cdhhd jdjd

Output should be:

bhdh jdjjd

Right now, I am using the below code.

while read -r line
do
  sed -i "/${line}/d" $file1
done < "$file2"

With this code, it's taking around 30 mins to get the output. I really need a better way to delete those lines from file1.

Answer 1

This is exactly for your task:

grep -vf file2 file1

-v will exclude lines of file1 that match any pattern in file2

Note: Your loop is very slow because you read the patterns file line by line with a bash loop and you execute thousands of sed commands, one for every pattern. See also here some more on why this is a bad practice.

Note: To replace file1 with the output of the above command:

grep -vf file2 file1 > file1.tmp && mv file1.tmp file1

Fast way to delete pattern lines from a file in shell

Question

1 answers

solution1
2 2020-09-24 08:45:11

Fast way to delete pattern lines from a file in shell

Question

1 answers

solution1 2 2020-09-24 08:45:11

solution1
2 2020-09-24 08:45:11