[英]shell: Get line from FILE1 by content in FILE2
I have a file (maillog) like this: 我有一个这样的文件(邮件日志):
Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[...
Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[...
Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id
Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added
Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<CORREO1@GM.COM>, relay...
Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed
Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<correo@prueba.com>,...
Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01...
Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<CO2@GM.COM>,status=sent
Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed
Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent
Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed
and a second file (mailids) like this: 和第二个文件(邮件ID),如下所示:
6DBDD8039F:
3B15BC803B:
BA1D7805A1:
2BD19803B4:
I want to get an output file that contains something like this: 我想要一个包含如下内容的输出文件:
Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent
Just the lines that the ID exists in the second file, in this example just the ID = BA1D7805A1: is in the file one. 仅在第二个文件中存在ID的行,在本示例中,仅文件1中存在ID = BA1D7805A1:。 But there's another condition, this line must be "ID to=<" it means that just the lines that contain "to=<" and the ID in file two can be output. 但是还有另一个条件,该行必须为“ ID to = <”,这意味着仅包含“ to = <”和文件2中ID的行可以输出。
I've found differents solutions, but I have a huge problem about the performance. 我找到了不同的解决方案,但是在性能方面存在很大的问题。 The maillog file size is 2GB, and its about 10millions lines. maillog文件的大小为2GB,大约有1000万行。 And the mailid file have around 32000 lines. mailid文件大约有32000行。
The process takes too much time, and I've never seen finished it. 这个过程花了太多时间,我从未见过完成它。 I've tried with awk and grep commands, but I dont find the best way. 我已经尝试过awk和grep命令,但是我找不到最佳方法。
grep -F -f mailids maillog | grep 'to=<'
From the grep
man page: 从grep
手册页:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
better to add -w
option 最好添加-w
选项
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
Here is the common command I use. 这是我使用的常用命令。
grep -Fwf mailids maillog |grep 'to=<'
and if the ID is fixed at column 6, try this one-liner awk command 如果ID固定在第6列,请尝试执行此单线awk命令
awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.