重击而非常慢

Question

I have a while loop that that reads in a mail log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. 我有一个while循环，该循环读取邮件日志文件并将其放入数组中，这样我就可以在数组中搜索并匹配/搜索流。 Unfortunately, the while loop is taking a long time to get through the file, it is a very large file but there must be another faster way of doing this. 不幸的是，while循环需要很长时间才能遍历文件，它是一个非常大的文件，但是必须有另一种更快的方式来完成此操作。

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1 

while read line; do awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ; done < /home/output_1

Any ideas? 有任何想法吗？ Thank's in advance. 提前致谢。

Answer 1

Let us analyse your script and try to explain why it is slow. 让我们分析您的脚本并尝试解释为什么它很慢。

Let's first start with a micro-optimization of your first line. 首先，我们对第一行进行微优化。 It's not going to speed up things, but this is merely educational. 它不会加快速度，但这仅仅是教育性的。

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1

In this line you make 4 calls to different binaries which in the end can be done by a single one. 在这一行中，您对不同的二进制文件进行了4次调用，最后可以一次调用一个二进制文件。 For readability, you could keep this line. 为了提高可读性，您可以保留此行。 However, here are two main points: 但是，这里有两个要点：

Useless use of cat . cat没用。 The program cat is mainly used to concattenate files. cat程序主要用于合并文件。 If you just add a single file, then it is basically overkilling. 如果仅添加一个文件，则基本上是多余的。 Especially if you want to pass it to grep . 特别是如果您要将其传递给grep 。
```
 cat file | grep ... => grep ... file 
```
- Useless use of cat? 没用的猫？
- https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat
multiple greps in combination with awk ... can be written as a single awk 与awk结合使用的多个抓取...可以写为单个awk
```
 awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}' 
```

So the entire line can be written as: 所以整行可以写成：

awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}' /home/maillog > /home/output_1

The second part is where things get slow: 第二部分是事情变慢的地方：

while read line; do 
   awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ;
done < /home/output_1

Why is this slow? 为什么这么慢？ Per line you read form /home/output_1 , you load the program awk into memory, you open the file /home/maillog , process every line of it and close the file /home/maillog . 您从/home/output_1表格中读取的每一行，将awk程序加载到内存中，打开文件/home/maillog ，对其进行处理，然后关闭文件/home/maillog 。 At the same time, per line you process, you open /home/output_2 every time, put the file pointer to the end of the file, write to the file and close the file again. 同时，在处理的每一行中，每次都打开/home/output_2 ，将文件指针置于文件末尾，写入文件并再次关闭文件。

The whole program can actually be done with a single awk: 整个程序实际上可以通过一个awk完成：

awk '(NR==FNR) && /Nov 13/ && /from=<xxxx@xxxx.com>/ {a[$6];next}($6 in a)' /home/maillog /home/maillog > /home/output2

重击而非常慢

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-11-19 11:53:40

重击而非常慢

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-11-19 11:53:40

解决方案1
4 已采纳 2018-11-19 11:53:40