简体   繁体   中英

Bash while VERY slow

I have a while loop that that reads in a mail log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. Unfortunately, the while loop is taking a long time to get through the file, it is a very large file but there must be another faster way of doing this.

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1 

while read line; do awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ; done < /home/output_1

Any ideas? Thank's in advance.

Let us analyse your script and try to explain why it is slow.

Let's first start with a micro-optimization of your first line. It's not going to speed up things, but this is merely educational.

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1 

In this line you make 4 calls to different binaries which in the end can be done by a single one. For readability, you could keep this line. However, here are two main points:

  1. Useless use of cat . The program cat is mainly used to concattenate files. If you just add a single file, then it is basically overkilling. Especially if you want to pass it to grep .

     cat file | grep ... => grep ... file 
  2. multiple greps in combination with awk ... can be written as a single awk

     awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}' 

So the entire line can be written as:

awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}' /home/maillog > /home/output_1

The second part is where things get slow:

while read line; do 
   awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ;
done < /home/output_1

Why is this slow? Per line you read form /home/output_1 , you load the program awk into memory, you open the file /home/maillog , process every line of it and close the file /home/maillog . At the same time, per line you process, you open /home/output_2 every time, put the file pointer to the end of the file, write to the file and close the file again.

The whole program can actually be done with a single awk:

awk '(NR==FNR) && /Nov 13/ && /from=<xxxx@xxxx.com>/ {a[$6];next}($6 in a)' /home/maillog /home/maillog > /home/output2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM