awk +过滤日志文件

Question

I used the following nice awk command in order to filter duplicate lines 我使用以下好的awk命令来过滤重复的行

example: 例：

cat LogFile | awk '!seen[$0]++'

the problem is that in some cases we need to filter duplicate lines in spite some fields are different and they no so important 问题是，在某些情况下，我们需要过滤重复的行，尽管有些字段不同，但它们并不那么重要

for example 例如

LogFile: 日志文件：

 [INFO],[02/Jun/2014-19:30:45],EXE,ds1a,INHT VERION , 1.4.4.3-08
 [INFO],[02/Jun/2014-19:31:25],EXE,ds1a,INHT VERION , 1.4.4.3-08
 [INFO],[02/Jun/2014-19:32:40],EXE,ds1a,INHT VERION , 1.4.4.3-08

please take a look on this file - LogFile 请看一下这个文件 - LogFile

I need to remove the duplicate lines from the third delimiter " , " until the end of the line , 我需要从第三个分隔符“ ， ”中删除重复的行，直到该行的结尾，

and no matter what is before the third delimiter 而且无论在第三个分隔符之前是什么

so finally I should get this filtered file: ( should get always the first one in the list ) 所以最后我应该得到这个过滤文件:(应该总是得到列表中的第一个）

    [INFO],[02/Jun/2014-19:30:45],EXE,ds1a,INHT VERION , 1.4.4.3-08

so please help me to complete my task 所以请帮我完成我的任务

how to filter the LofFile from the third delimiter " , " , and ignore the fields: [INFO],[...........],EXE, 如何从第三个分隔符“ ， ”过滤LofFile，并忽略字段： [INFO]，[...........]，EXE，

Remark – implantation can be also with perl one liner line 备注 - 植入也可以使用perl one liner line

Answer 1

With GNU awk for gensub(): 使用GNU awk for gensub（）：

$ awk '!seen[gensub(/([^,]*,){3}/,"","")]++' file
[INFO],[02/Jun/2014-19:30:45],EXE,ds1a,INHT VERION , 1.4.4.3-08

With any awk that supports RE intervals (most modern awks): 任何支持RE间隔的awk（大多数现代awks）：

$ awk '{key=$0; sub(/([^,]*,){3}/,"",key)} !seen[key]++' file
[INFO],[02/Jun/2014-19:30:45],EXE,ds1a,INHT VERION , 1.4.4.3-08

Answer 2

Using a perl one-liner: 使用perl单线程：

perl -lne '$k = s/(.*?,){3}//r; print if !$seen{$k}++' file.log

Outputs: 输出：

[INFO],[02/Jun/2014-19:30:45],EXE,ds1a,INHT VERION , 1.4.4.3-08

Explanation: 说明：

Switches : 开关：

-l : Enable line ending processing. -l ：启用行结束处理。 ( Only needed if last line of log file is missing the new line ) （ 仅当最后一行日志文件缺少新行时才需要 ）
-n : Creates a while(<>){..} loop for each line in your input file. -n ：为输入文件中的每一行创建一个while(<>){..}循环。
-e : Tells perl to execute the code on command line. -e ：告诉perl在命令行上执行代码。

Code : 代码：

$k = s/(.*?,){3}//r : Save everything after the third comma in the variable $k $k = s/(.*?,){3}//r ：将第三个逗号后的所有内容保存在变量$k
print if !$seen{$k}++ : Print the line if the key is not seen before. print if !$seen{$k}++ ：如果之前没有看到该键，则打印该行。

Answer 3

使用autosplit的方式略有不同：

perl -aF, -ne'print unless $seen{"@F[3..$#F]"}++' logfile.txt

Answer 4

你可以有：

awk 'BEGIN{FS=OFS=","}{o=$0;$1=$2=$3=""}!seen[$0]++{print o;}' ...

awk +过滤日志文件

问题描述

4 个解决方案

解决方案1
4 已采纳 2014-06-02 18:00:03

解决方案2
2 2014-06-02 18:13:12

Explanation: 说明：

解决方案3
1 2014-06-02 18:52:00

解决方案4
0 2014-06-02 18:01:28

awk +过滤日志文件

问题描述

4 个解决方案

解决方案1 4 已采纳 2014-06-02 18:00:03

解决方案2 2 2014-06-02 18:13:12

Explanation: 说明：

解决方案3 1 2014-06-02 18:52:00

解决方案4 0 2014-06-02 18:01:28

解决方案1
4 已采纳 2014-06-02 18:00:03

解决方案2
2 2014-06-02 18:13:12

解决方案3
1 2014-06-02 18:52:00

解决方案4
0 2014-06-02 18:01:28