简体   繁体   English

.gz文件日志中txt列表中的大grep

[英]Big grep from txt list in .gz file logs

this is my problem (for me actually a big problem). 这是我的问题(对我来说实际上是一个大问题)。

I have a txt file with 1.130.395 lines as below an example: 我有一个带有1.130.395行的txt文件,如下例所示:

10812
10954
10963
11070
11099
10963
11070
11099
betti.bt
betti12
betti1419432307
19442407
19451970
19461949

i have like 2000 .gz log files. 我有2000个.gz日志文件。

I need that for every line of the .txt file a grep is performed on all .gz files. 我需要为.txt文件的每一行在所有.gz文件上执行grep。

This is an example of the contents of the gz files, an example line: 这是gz文件内容的示例,示例行:

time=2019-02-28 00:03:32,299|requestid=30ed0f2b-9c44-47d0-abdf-b3a04dbb560e|severity=INFO |severitynumber=0|url=/user/profile/oauth/{token}|params=username:juvexamore,token:b73ad88b-b201-33ce-a924-6f4eb498e01f,userIp:10.94.66.74,dtt:No|result=SUCCESS
time=2019-02-28 00:03:37,096|requestid=8ebca6cd-04ee-4818-817d-30f78ee95731|severity=INFO |severitynumber=0|url=/user/profile/oauth/{token}|params=username:10963,token:1d99be3e-325f-3982-a668-30494cab9a96,userIp:10.94.66.74,dtt:No|result=SUCCESS

The txt file contains the username. txt文件包含用户名。 I need to search in the gz files if the username is present for the url with "profile" parameters and for "result=SUCCESS". 我需要在gz文件中搜索是否存在带有“配置文件”参数的url和“结果=成功”的用户名。

if something is found, write to a log file only: username found; name of the log file in which it was found 如果发现某些内容,则仅写入日志文件: username found; name of the log file in which it was found username found; name of the log file in which it was found

It is possibile to do something? 做某事是可能的吗? I know that i need to use zgrep command, but can someone help me....it is possibile to automate the process to let it go? 我知道我需要使用zgrep命令,但是有人可以帮助我....可以自动执行该过程以实现此目标吗?

Thanks all 谢谢大家

A rewrite using getline . 使用getline重写。 It reads and hashes the file.txt usernames, then gunzips gzips given as parameters, split s until gets the field with the username: , extracts the actual username and searches it from the hash. 它读取并散列file.txt用户名,然后用gzips作为参数给定的gzips, split s直到获得具有username:的字段username: ,提取实际的用户名并从哈希中搜索它。 Not properly tested etc. etc. standard disclaimer. 未经正确测试等。免责声明。 Let me know if it worked: 让我知道它是否有效:

$ cat script.awk
BEGIN{
    while (( getline line < ARGV[1]) > 0 ) {       # read the username file
        a[line]                                    # and hash to a
    }
    close(ARGV[1])
    for(i=2;i<ARGC;i++) {                          # read all the other files
        cmd = "gunzip --to-stdout " ARGV[i]        # form uncompress command
        while (( cmd | getline line ) > 0 ) {      # read line by line
            m=split(line,t,"|")                    # split at pipe
            if(t[m]!="result=SUCCESS")             # check only SUCCESS records
                continue
            n=split(t[6],b,/[=,]/)                 # username in 6th field
            for(j=1;j<=n;j++)                      # split to find it, set to u var:
                if(match(b[j],/^username:/)&&((u=substr(b[j],RSTART+RLENGTH)) in a)) {
                    print u,"found in",ARGV[i]     # output if found in a hash
                        break                      # exit for loop once found
                }
        }
        close(cmd)
    }
}

Run it (using 2 copies of the same data): 运行它(使用相同数据的2个副本):

$ awk -f script.awk file.txt log-0001.gz log-0001.gz
10963 found in log-0001.gz
10963 found in log-0001.gz

I'd just do (untested): 我会做(未试):

zgrep -H 'url=/user/profile/oauth/{token}|params=username:.*result=SUCCESS' *.gz |
awk -F'[=:,]' -v OFS=';' 'NR==FNR{names[$0];next} $12 in names{print $12, $1}' names.txt - |
sort -u

or probably a little more efficient as it removes the NR==FNR test for every line output by zgrep: 或可能更有效,因为它消除了zgrep输出的每一行的NR==FNR测试:

zgrep -H 'url=/user/profile/oauth/{token}|params=username:.*result=SUCCESS' *.gz |
awk -F'[=:,]' -v OFS=';' '
    BEGIN {
        while ( (getline line < "names.txt") > 0 ) {
            names[line]
        }
        close("names.txt")
    }
    $12 in names{print $12, $1}' |
sort -u

If a given user name can only appear once in a given log file or if you actually want multiple occurrences to produce multiple output lines then you don't need the final | sort -u 如果给定的用户名只能在给定的日志文件中出现一次,或者如果您实际上希望多次出现以产生多条输出行,则不需要最后的| sort -u | sort -u . | sort -u

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM