简体   繁体   English

如何仅获取 bash 中特定列的重复行

[英]How to obtain only repeated lines for a specific column in bash

Imagine I have this file in bash:想象一下,我在 bash 中有这个文件:

1 3 6 name1
1 2 7 name2
3 4 2 name1
2 2 2 name3
7 8 2 name2
1 2 9 name4

How could I extract just those lines which present the field "name" repeated and sort them?我怎样才能提取那些重复出现“名称”字段的行并对其进行排序?

My expected output would be:我预期的 output 将是:

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

I was trying to use sort -k4,4 myfile | uniq -D我试图使用sort -k4,4 myfile | uniq -D sort -k4,4 myfile | uniq -D , but I don't find how to tell uniq to work with the 4th column. sort -k4,4 myfile | uniq -D ,但我不知道如何告诉uniq使用第 4 列。 Thanks!谢谢!

You were close.你很亲密。 You need to skip fields preceding the last one.您需要跳过最后一个字段之前的字段。

$ sort -k4 file | uniq -f3 -D
1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

Could you please try following.请您尝试以下操作。

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file

OR in case you want to sort the output try following then.或者,如果您想对 output 进行排序,请尝试以下操作。

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file  |  sort -k4

You may use this awk + sort :您可以使用此awk + sort

awk 'FNR==NR{freq[$NF]++; next} freq[$NF] > 1' file{,} | sort -k4

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM