简体   繁体   English

从awk中排除列

[英]exclude columns from awk

I am trying to delete few columns and then to unique of the file contents. 我试图删除几列,然后删除文件内容的唯一。 Columns which I want to delete are like month,day,time and epoch time;these are different in each line and cannot let me to unique of the file contents. 我要删除的列类似于月,日,时间和纪元时间;这些列在每行中都不同,并且不能让我对文件内容的唯一性。

Sample contents of sample.log : sample.log的示例内容:

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:14 AAA AAA AAAA 1433495594.306612 XXXX CCCC CCCC AAAA SDDDD DFFFFF222
Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:16 AAA AAA AAAA XXXXX 1433495597.306615 XXXX CCCC CCCC AAAA SDDDD DFFFFF333
Jun  5 05:13:17 AAA AAA AAAA XXXXX 1433495598.306616 XXXX CCCC CCCC AAAA SDDDD DFFFFF444

Issue: 问题:

Month, date,time are in fixed column , however epoch time is toggling between column number 7 and 8. Want to know how to deal with this. 月份,日期,时间都在固定列中,但是纪元时间是在第7和第8列之间切换。想知道如何处理这个问题。

Sample output: 样本输出:

Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:13 AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
Jun  5 05:13:15 AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

If above is too much to ask then like below: 如果上面的问题太多,那么就像下面这样:

AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA 1433495593.306611 XXXX CCCC CCCC AAAA SDDDD DFFFFF111
AAA AAA AAAA XXXXX 1433495596.306614 XXXX CCCC CCCC AAAA SDDDD DFFFFF111

I am trying things in following direction but not very helpful. 我正在按照以下方向尝试,但不是很有帮助。

while read line
    do

seven=$(echo $line |awk '{print $7}')
eight=$(echo $line |awk '{print $8}')

if [[ "$seven" =~ "^[0-9]" ]];then
    #echo "seventh column starts with number"
    echo $line|awk '$1=$2=$3=$7=" " {print}'
else
    #echo "Eighth column starts with number"
     echo $line|awk '$1=$2=$3=$8=" " {print}'
fi
    done < $1

More example: 更多例子:

Input file contents: 输入文件内容:

Jun  5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF

Output: 输出:

Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF

OR 要么

Output: 输出:

 AAA BBB CCC  DDD EEE FFFF
 AAA BBB CCC  DDD EEE GGGG
 AAA BBB CCC XXX  DDD EEE GGGG
 AAA BBB CCC XXX  DDD EEE FFFF

A very basic approach is to check the format of the field: if it consists in digits + . 一种非常基本的方法是检查字段的格式:如果它包含数字+ . + digits, that's the one! +数字,就是那个!

awk '{$1=$2=$3=""
      if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""}
      else {$8=""}
     } 1' file

Note this leaves some extra spaces all around because when you empty a field, the interleaving FS remain there. 请注意,这会留下一些额外的空间,因为当您清空一个字段时,交错FS仍然存在。 For a clean removal of columns, check Ed Morton's answer to Print all but the first three columns . 要清除删除列,请检查Ed Morton对除前三列之外的所有打印的答案。


To make sure every 1st, 2nd, 3rd and last block of columns do not repeat, use the awk '!uniq[$0]++' file approach: 要确保每个第1,第2,第3和最后一列的列不重复,请使用awk '!uniq[$0]++' file方法:

awk '!uniq[$1 $2 $3 $(NF-4) $(NF-2) $(NF-1) $NF]++{$1=$2=$3=""
      if ($7 ~ /^[0-9]+\.[0-9]+$/) {$7=""}
      else {$8=""}
     } 1' file

If I'm understanding the question correctly there is no need for Bash here, just Awk: 如果我正确地理解了这个问题,那么就不需要Bash,只需要awk:

% awk '
{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf $f " "
            }
        } else {
            printf $f " "
        }
    }
    printf "\n"
}
' sample.log          
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF222 
AAA AAA AAAA XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF111 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF333 
AAA AAA AAAA XXXXX XXXX CCCC CCCC AAAA SDDDD DFFFFF444 

To grab the unique rows: 获取唯一行:

% awk '             
{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf $f " "
            }
        } else {
            printf $f " "
        }
    }
    printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD EEE FFFF 
AAA BBB CCC DDD EEE GGGG 
AAA BBB CCC XXX DDD EEE FFFF 
AAA BBB CCC XXX DDD EEE GGGG 

On handling %s ... 在处理%s ...

If your input file contains % signs, per your comment, you'll need to escape these before passing them into printf . 如果您的输入文件包含%符号,则根据您的注释,您需要在将它们传递给printf之前将其转义。 You could do that with a function like this... 你可以用这样的function做到这一点......

% awk '             
function escape_percents(s) 
{ 
    gsub("%", "%%", s) 
    return s
}

{
    for (f = 4; f <= NF; ++f) { # Start at column 4
        if (f == 7 || f == 8) { # Treat columns 7 or 8 differently
            if ($f !~ /^[0-9]+\.[0-9]+$/) { # Only print if non-numeric
                printf escape_percents($f) " "
            }
        } else {
            printf escape_percents($f) " "
        }
    }
    printf "\n"
}
' sample2.log | sort -u
AAA BBB CCC DDD %E%E%E FFFF 
AAA BBB CCC DDD %E%E%E GGGG 
AAA BBB CCC XXX DDD %E%E%E FFFF 
AAA BBB CCC XXX DDD %E%E%E GGGG 

If the columns after the epoch time remain constant, then the easiest way is to manipulate only NF. 如果在纪元时间之后的列保持不变,那么最简单的方法是仅操纵NF。

Using input from More example: 使用来自更多示例的输入:

awk '{NewLine=$4; 
for(i=(NF-5);i>=0;i--){
if(i!=3){
NewLine=NewLine" "$(NF-i)
}
}
print NewLine
}' Sample.log | sort | uniq

Using the input 使用输入

Jun  5 05:13:13 AAA BBB CCC 142222222222.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC 142222222223.000 DDD EEE FFFF
Jun  5 05:13:14 AAA BBB CCC 142222222224.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE GGGG
Jun  5 05:13:13 AAA BBB CCC XXX 142222222225.000 DDD EEE FFFF
Jun  5 05:13:13 AAA BBB CCC XXX 142222222226.000 DDD EEE FFFF

you will get 你会得到

AAA BBB CCC DDD EEE FFFF
AAA BBB CCC DDD EEE GGGG
AAA BBB CCC XXX DDD EEE FFFF
AAA BBB CCC XXX DDD EEE GGGG

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM