简体   繁体   English

shell:通过FILE2中的内容从FILE1中获取行

[英]shell: Get line from FILE1 by content in FILE2

I have a file (maillog) like this: 我有一个这样的文件(邮件日志):

    Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[...
    Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[...
    Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id 
    Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added
    Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<CORREO1@GM.COM>, relay...
    Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed
    Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<correo@prueba.com>,...
    Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01...
    Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<CO2@GM.COM>,status=sent
    Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed
    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent
    Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed

and a second file (mailids) like this: 和第二个文件(邮件ID),如下所示:

    6DBDD8039F:
    3B15BC803B:
    BA1D7805A1:
    2BD19803B4:

I want to get an output file that contains something like this: 我想要一个包含如下内容的输出文件:

    Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<CO3@GM.COM>, status=sent

Just the lines that the ID exists in the second file, in this example just the ID = BA1D7805A1: is in the file one. 仅在第二个文件中存在ID的行,在本示例中,仅文件1中存在ID = BA1D7805A1:。 But there's another condition, this line must be "ID to=<" it means that just the lines that contain "to=<" and the ID in file two can be output. 但是还有另一个条件,该行必须为“ ID to = <”,这意味着仅包含“ to = <”和文件2中ID的行可以输出。

I've found differents solutions, but I have a huge problem about the performance. 我找到了不同的解决方案,但是在性能方面存在很大的问题。 The maillog file size is 2GB, and its about 10millions lines. maillog文件的大小为2GB,大约有1000万行。 And the mailid file have around 32000 lines. mailid文件大约有32000行。

The process takes too much time, and I've never seen finished it. 这个过程花了太多时间,我从未见过完成它。 I've tried with awk and grep commands, but I dont find the best way. 我已经尝试过awk和grep命令,但是我找不到最佳方法。

grep -F -f mailids maillog | grep 'to=<'

From the grep man page: grep手册页:

   -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)

   -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

better to add -w option 最好添加-w选项

   -w, --word-regexp
          Select  only  those  lines  containing  matches  that form whole
          words.  The test is that the matching substring must  either  be
          at  the  beginning  of  the  line,  or  preceded  by  a non-word
          constituent character.  Similarly, it must be either at the  end
          of  the  line  or  followed by a non-word constituent character.
          Word-constituent  characters  are  letters,  digits,   and   the
          underscore.

Here is the common command I use. 这是我使用的常用命令。

grep -Fwf mailids maillog |grep 'to=<'

and if the ID is fixed at column 6, try this one-liner awk command 如果ID固定在第6列,请尝试执行此单线awk命令

awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM