简体   繁体   English

如何打印与 awk 多次匹配的列的第一次出现

[英]how to print the first occurence of a column matching more than once with awk

I have a log_file with all my backups and a column with value yes means it won't be deleted by the retention policy (Preserved).我有一个包含所有备份的 log_file 和一个值为 yes 的列,表示保留策略不会删除它(保留)。 there could be 1 or more rows having that preserved column = yes for a specific vmname.对于特定的 vmname,可能有 1 行或更多行具有该保留列 = yes。

My input is:我的输入是:

=    FULL     ==   20210105   ==     2100     == ASR-FULL-20210105-2100 ==  YES
=    FULL     ==   20210202   ==     2100     == ASR-FULL-20210202-2100 ==  YES
=    FULL     ==   20210302   ==     2100     == ASR-FULL-20210302-2100 ==  YES
=    FULL     ==   20210406   ==     2100     == ASR-FULL-20210406-2100 ==  YES
=    FULL     ==   20210105   ==     2146     == DNS10_7-FULL-20210105-2146 ==  YES
=    FULL     ==   20210202   ==     2153     == DNS10_7-FULL-20210202-2153 ==  YES
=    FULL     ==   20210302   ==     2148     == DNS10_7-FULL-20210302-2148 ==  YES
=    FULL     ==   20210406   ==     2122     == DNS10_7-FULL-20210406-2122 ==  YES
=    FULL     ==   20210105   ==     2105     == execnet.0-FULL-20210105-2105 ==  YES
=    FULL     ==   20210202   ==     2106     == execnet.0-FULL-20210202-2106 ==  YES
=    FULL     ==   20210302   ==     2106     == execnet.0-FULL-20210302-2106 ==  YES
=    FULL     ==   20210406   ==     2105     == execnet.0-FULL-20210406-2105 ==  YES
=    FULL     ==   20210106   ==     0200     == Prtgadmin.0-FULL-20210106-0200 ==  YES
=    FULL     ==   20210105   ==     2216     == sandbox.0-FULL-20210105-2216 ==  YES
=    FULL     ==   20210202   ==     2227     == sandbox.0-FULL-20210202-2227 ==  YES
=    FULL     ==   20210406   ==     2152     == sandbox.0-FULL-20210406-2152 ==  YES
=    FULL     ==   20210105   ==     2236     == wwwp.0-FULL-20210105-2236 ==  YES
=    FULL     ==   20210202   ==     2249     == wwwp.0-FULL-20210202-2249 ==  YES
=    FULL     ==   20210105   ==     2259     == wwws.0-FULL-20210105-2259 ==  YES
=    FULL     ==   20210202   ==     2314     == wwws.0-FULL-20210202-2314 ==  YES
=    FULL     ==   20210105   ==     2259     == webhost.0-FULL-20210105-2259 ==  YES

My desired output is to print the n-1 oldest matches (top n-1)我想要的 output 是打印 n-1 个最旧的匹配项(前 n-1 个)

ASR-FULL-20210105-2100        
ASR-FULL-20210202-2100         
ASR-FULL-20210302-2100         
DNS10_7-FULL-20210105-2146     
DNS10_7-FULL-20210202-2153     
DNS10_7-FULL-20210302-2148     
execnet.0-FULL-20210105-2105  
execnet.0-FULL-20210202-2106   
execnet.0-FULL-20210302-2106   
sandbox.0-FULL-20210105-2216   
sandbox.0-FULL-20210202-2227   
wwwp.0-FULL-20210105-2236     
wwws.0-FULL-20210105-2259

I can so far have the below result by running the below awk commands but It shows the most recent matches instead.到目前为止,我可以通过运行以下 awk 命令获得以下结果,但它显示的是最近的匹配项。 I'd also like to have one awk command ideally.理想情况下,我还希望有一个 awk 命令。 The year filter is not that important.年份过滤器并不那么重要。

# cat bkp_list.log| grep -E '*2021.*YES'| awk -F[==-] 'cnt[$8]++{if (cnt[$8]>1) print prev=$0;next}' |awk -F[==] '{print $8}' 
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
ASR-FULL-20210406-2100
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
DNS10_7-FULL-20210406-2122
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
execnet.0-FULL-20210406-2105
sandbox.0-FULL-20210202-2227
sandbox.0-FULL-20210406-2152
wwwp.0-FULL-20210202-2249
wwws.0-FULL-20210202-2314

Thank you谢谢

To print all but last match of a substring of $8 you may use this awk :要打印$8的 substring 的最后一场比赛,您可以使用此awk

awk '
$NF != "YES" {next}
{
   s = $8
   sub(/-FULL-.*/, "", s)
}
s == ps {
   print pval
}
{
   ps = s
   pval = $8
}' file

ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259

Or one liner:或一个班轮:

awk '$NF != "YES"{next} {s=$8; sub(/-FULL-.*/, "", s)} s == ps {print pval} {ps = s; pval=$8}' file

If you want to filter on the column with the YES, you can do it with the contitional expressions before blocks如果要过滤带有 YES 的列,可以使用块前的条件表达式进行过滤

$ cat file
=    FULL     ==   20210105   ==     2100     == ASR-FULL-20210105-2100 ==  NO
=    FULL     ==   20210202   ==     2100     == ASR-FULL-20210202-2100 ==  YES
=    FULL     ==   20210302   ==     2100     == ASR-FULL-20210302-2100 ==  YES
=    FULL     ==   20210406   ==     2100     == ASR-FULL-20210406-2100 ==  YES
=    FULL     ==   20210105   ==     2146     == DNS10_7-FULL-20210105-2146 ==  YES
=    FULL     ==   20210202   ==     2153     == DNS10_7-FULL-20210202-2153 ==  YES
=    FULL     ==   20210302   ==     2148     == DNS10_7-FULL-20210302-2148 ==  YES
=    FULL     ==   20210406   ==     2122     == DNS10_7-FULL-20210406-2122 ==  YES
=    FULL     ==   20210105   ==     2105     == execnet.0-FULL-20210105-2105 ==  YES
=    FULL     ==   20210202   ==     2106     == execnet.0-FULL-20210202-2106 ==  YES
=    FULL     ==   20210302   ==     2106     == execnet.0-FULL-20210302-2106 ==  YES
=    FULL     ==   20210406   ==     2105     == execnet.0-FULL-20210406-2105 ==  YES
=    FULL     ==   20210106   ==     0200     == Prtgadmin.0-FULL-20210106-0200 ==  YES
=    FULL     ==   20210105   ==     2216     == sandbox.0-FULL-20210105-2216 ==  YES
=    FULL     ==   20210202   ==     2227     == sandbox.0-FULL-20210202-2227 ==  YES
=    FULL     ==   20210406   ==     2152     == sandbox.0-FULL-20210406-2152 ==  YES
=    FULL     ==   20210105   ==     2236     == wwwp.0-FULL-20210105-2236 ==  YES
=    FULL     ==   20210202   ==     2249     == wwwp.0-FULL-20210202-2249 ==  YES
=    FULL     ==   20210105   ==     2259     == wwws.0-FULL-20210105-2259 ==  YES
=    FULL     ==   20210202   ==     2314     == wwws.0-FULL-20210202-2314 ==  YES
=    FULL     ==   20210105   ==     2259     == webhost.0-FULL-20210105-2259 ==  YES

$ awk ' $NF == "YES" { print $(NF-2) }' file
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
ASR-FULL-20210406-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
DNS10_7-FULL-20210406-2122
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
execnet.0-FULL-20210406-2105
Prtgadmin.0-FULL-20210106-0200
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
sandbox.0-FULL-20210406-2152
wwwp.0-FULL-20210105-2236
wwwp.0-FULL-20210202-2249
wwws.0-FULL-20210105-2259
wwws.0-FULL-20210202-2314
webhost.0-FULL-20210105-2259

$ awk ' $NF == "NO" { print $(NF-2) }' file
ASR-FULL-20210105-2100
$

** note I changed the first line YES to NO to check the correct behaviour ** 注意我将第一行 YES 更改为 NO 以检查正确的行为

Anyway, if u need to do any other special filtering, like checking the year, please specify无论如何,如果您需要进行任何其他特殊过滤,例如检查年份,请指定

With GNU awk for gensub():对于 gensub() 使用 GNU awk:

$ tac file | awk '$NF=="YES" && seen[gensub(/-.*/,"",1,$8)]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259

or with any awk:或使用任何 awk:

$ tac file | awk '$NF!="YES"{next} {k=$8; sub(/-.*/,"",k)} seen[k]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM