如何使用 awk 根据两个单独的字段数据过滤文件？

Question

Input file consists of 3 fields separated by "|"输入文件由 3 个字段组成，以“|”分隔as follows:如下：

TeamId|TeamName|TotalPlayers

TeamId consists of unique numbers. TeamId 由唯一编号组成。 TeamName consists of several Premier league football teams and corresponding no of players in TotalPlayers field. TeamName 由多个英超联赛球队和 TotalPlayers 字段中相应的球员组成。

2 of the records are as follows:(these records belong to one of the visible test cases)其中2条记录如下：（这些记录属于可见测试用例之一）

103|Manchester United|12

105|Manchester City|13

Code requirement:代码要求：

I had to output the TeamName which starts with Manchester and has the most number of players.我必须 output 以曼彻斯特开头并且拥有最多球员的 TeamName。 If no Team starts with Manchester, then it should not be any output.如果没有球队从曼彻斯特开始，那么它不应该是任何 output。 ie In the above test case the output should be Manchester City.即在上述测试案例中，output 应该是曼城。

My solution:我的解决方案：

awk 'BEGIN{FS = "|";OFS = ",";}{if($2 ~ /^Manchester/){print $2, $3}}' | sort -n -k2 | awk -F , '(NR==1){print $1}'

This provided the expected output for normal test cases but hidden test cases were failed.这为正常测试用例提供了预期的 output，但隐藏的测试用例失败了。 What changes can I make to this or any other easier way to achieve the same...我可以对此或任何其他更简单的方法进行哪些更改以实现相同的...

Also recommend any websites where I can practice these kind of unix coding problems by solving.还推荐任何我可以通过解决来练习这些 unix 编码问题的网站。

Answer 1

I had to output the TeamName which starts with Manchester and has the most number of players.我必须 output 以曼彻斯特开头并且拥有最多球员的 TeamName。 If no Team starts with Manchester, then it should not be any output.如果没有球队从曼彻斯特开始，那么它不应该是任何 output。 ie In the above test case the output should be Manchester City.即在上述测试案例中，output 应该是曼城。

$ cat file 
TeamId|TeamName|TotalPlayers
103|Manchester United|12
105|Manchester City|13

$ awk -F'|' '$2~/^Manchester/ && $3 >max{max=$3; team=$2}END{if(team)print team}' file 
Manchester City

Answer 2

Could you please try following, written and tested with shown samples in GNU awk .您能否尝试使用 GNU awk中的示例进行跟踪、编写和测试。

awk '
BEGIN{
  FS="|"
}
FNR>1 && $2~/^Manchester/{
  arr[$NF]=(arr[$NF]?arr[$NF] ORS:"")$2
  max=(max>$NF?max:$NF)
}
END{
  if(max!=""){
    num=split(arr[max],val,ORS)
    if(num>1){
       for(i=1;i<=num;i++){
          print val[i],max
       }
    }
    else{ print arr[max],max }
  }
}
'  Input_file

Explanation: Adding detailed explanation for above.说明：为上述添加详细说明。

awk '                                      ##Starting awk program from here.
BEGIN{                                     ##Starting BEGIN section of program from here.
  FS="|"                                   ##Setting FS as | here.
}
FNR>1 && $2~/^Manchester/{                 ##Checking condition if line number is more than 1 then do following.
  arr[$NF]=(arr[$NF]?arr[$NF] ORS:"")$2    ##Creating array arr with index of last field and keep appending its value with new line in case similar max objects found to print them all.
  max=(max>$NF?max:$NF)                    ##Creating max by checking if value of 2nd field if its greater than $2 then keep it else assign its value as $2.
}
END{                                       ##Starting END block of this program from here.
  if(max!=""){                             ##Checking condition if max is NOT NULL then do following.
    num=split(arr[max],val,ORS)            ##Splitting arr[max] value into val array with delimiter of ORS here.
    if(num>1){                             ##if num(total number of elements in arr) is greater than 1 then do following.
       for(i=1;i<=num;i++){                ##Start a loop till value of num here.
          print val[i],max                 ##Printing value of val with index i and max here.
       }
    }
    else{ print arr[max],max }             ##Else printing value of arr[max] and max only 1 time.
  }
}
' Input_file                               ##Mentioning Input_file name here.

如何使用 awk 根据两个单独的字段数据过滤文件？

问题描述

2 个解决方案

解决方案1
6 2020-12-24 10:40:16

解决方案2
2 2020-12-24 10:48:26

如何使用 awk 根据两个单独的字段数据过滤文件？

问题描述

2 个解决方案

解决方案1 6 2020-12-24 10:40:16

解决方案2 2 2020-12-24 10:48:26

解决方案1
6 2020-12-24 10:40:16

解决方案2
2 2020-12-24 10:48:26