简体   繁体   中英

Fast way to delete pattern lines from a file in shell

I have a file1 with approx 60000 lines, And a file2 with approx 20000 lines. I need to delete the lines present in file2 from file1. File2 also contains .* to delete the similar pattern from file1.

file1:

ABC DEG
bhdh jdjjd
cdhhd jdjd
ABC hjj

file2:

ABC.*
cdhhd jdjd

Output should be:

bhdh jdjjd

Right now, I am using the below code.

while read -r line
do
  sed -i "/${line}/d" $file1
done < "$file2" 

With this code, it's taking around 30 mins to get the output. I really need a better way to delete those lines from file1.

This is exactly for your task:

grep -vf file2 file1

-v will exclude lines of file1 that match any pattern in file2


Note: Your loop is very slow because you read the patterns file line by line with a bash loop and you execute thousands of sed commands, one for every pattern. See also here some more on why this is a bad practice.


Note: To replace file1 with the output of the above command:

grep -vf file2 file1 > file1.tmp && mv file1.tmp file1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM