简体   繁体   English

Bash:如何从txt文件读取行,如何按列值识别和删除重复项

[英]Bash: How to read lines from a txt file, identify and remove duplicates by column value

I'm writing a Bash script, in which I'm collecting data from some logfiles, removing the duplicate values, sort it by username and storing the output in the output.txt. 我正在编写一个Bash脚本,其中从一些日志文件中收集数据,删除重复的值,按用户名对其进行排序,然后将输出存储在output.txt中。 What I would like to do next, is to read the output.txt line by line and whenever the username appears more than once, to create a new line which will have data from both lines. 我接下来要做的是逐行读取output.txt,并且每当用户名出现多次时,创建一条新行,该行将同时包含两行数据。

The purpose of this, is to send an email to the user ONLY ONCE and inform him that he can't use this feature on this server. 这样做的目的是仅向用户发送电子邮件,并通知他他不能在此服务器上使用此功能。

I don't know if I'm explaining it well.. For example see below the output.txt 我不知道我是否解释得很好。例如,请参见下面的output.txt

output.txt output.txt

13:49:19 DENIED: "Software_1" UserA serv7 (Can't run this feature. ) 13:49:19拒绝:“ Software_1” UserA serv7(无法运行此功能。)
13:49:19 DENIED: "Software_2" UserA serv7 (Can't run this feature. ) 13:49:19拒绝:“ Software_2” UserA serv7(无法运行此功能。)
15:09:14 DENIED: "Software_3" UserB serv5 (Can't run this feature. ) 15:09:14拒绝:“ Software_3” UserB serv5(无法运行此功能。)
15:09:15 DENIED: "Software_4" UserB serv5 (Can't run this feature. ) 15:09:15拒绝:“ Software_4” UserB serv5(无法运行此功能。)
17:20:43 DENIED: "Software_3" UserC serv5 (Can't run this feature. ) 17:20:43拒绝:“ Software_3” UserC serv5(无法运行此功能。)
17:20:43 DENIED: "Software_5" UserC serv8 (Can't run this feature. ) 17:20:43拒绝:“ Software_5” UserC serv8(无法运行此功能。)

expected result 预期结果

Software_1, Software_2, UserA serv7 (Can't run this feature. ) Software_1,Software_2, UserA serv7(无法运行此功能。)
Software_3, Software_4, UserB serv5 (Can't run this feature. ) Software_3,Software_4, UserB serv5(无法运行此功能。)
Software_3, Software_5, UserC serv5, serv8 (Can't run this feature. ) Software_3,Software_5, UserC serv5,serv8(无法运行此功能。)

Can someone suggest a solution and explain how it's working? 有人可以提出解决方案并解释其工作原理吗?

process.awk: process.awk:

#! /usr/bin/awk -f
NF {
     sft[$4] = sft[$4] $3 ", ";
     if (!(srv[$4] ~ $5))
         srv[$4] = srv[$4] $5 ", ";
     mesg[$4] = $6 " " $7 " " $8 " " $9 " " $10
}

END {
    for (user in sft) {
        gsub("\"", "", sft[user]);
        print  sft[user], user, srv[user], mesg[user];
    }
}

This AWK script checks if a line is not blank (using the number of fields built-in variable, NF). 此AWK脚本检查行是否为空白(使用内置变量NF的字段数)。 If so, it maintains an array for software and server, indexing them by the user field. 如果是这样,它将维护一个用于软件和服务器的数组,并通过用户字段对其进行索引。 For each user type, it appends the server and software associated to it. 对于每种用户类型,它将附加与之关联的服务器和软件。 This happens for each line. 每行都会发生这种情况。

When all the lines in the input file is done processing, in the END pattern, it iterates over all the entries in the software array printing the associated software, users, servers and the message. 当输入文件中的所有行都完成处理后,以END模式运行,它将遍历软件阵列中的所有条目,以打印相关的软件,用户,服务器和消息。

To run it 运行它

awk -f process.awk output.txt
$ cat userlog_unparsed.log
13:49:19 DENIED: "Software_1" UserA serv7 (Can't run this feature. )
13:49:19 DENIED: "Software_2" UserA serv7 (Can't run this feature. )
15:09:14 DENIED: "Software_3" UserB serv5 (Can't run this feature. )
15:09:15 DENIED: "Software_4" UserB serv5 (Can't run this feature. )
17:20:43 DENIED: "Software_3" UserC serv5 (Can't run this feature. )
17:20:43 DENIED: "Software_5" UserC serv8 (Can't run this feature. )


$ awk '
     { sws[$4][$3]++; srvs[$4][$5]++; }
     END{
         for(user in sws){
             swuser="";srvuser="";
             for(sw in sws[user]){swuser=swuser","sw}
             for(srv in srvs[user]){srvuser=srvuser","srv};
             print substr(swuser,2) ", " user ", " substr(srvuser,2);
         }
     }' userlog_unparsed.log

"Software_2","Software_1", UserA, serv7
"Software_3","Software_4", UserB, serv5
"Software_3","Software_5", UserC, serv5,serv8

Explanation: 说明:

  1. Record all the users & their softwares, servers. 记录所有用户及其软件,服务器。
  2. At the end, loop through all of them, & append the users, their respective servers & softwares. 最后,遍历所有用户,并附加用户,他们各自的服务器和软件。 & print them. 并打印出来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM