如何仅获取 bash 中特定列的重复行

Question

Imagine I have this file in bash:想象一下，我在 bash 中有这个文件：

1 3 6 name1
1 2 7 name2
3 4 2 name1
2 2 2 name3
7 8 2 name2
1 2 9 name4

How could I extract just those lines which present the field "name" repeated and sort them?我怎样才能提取那些重复出现“名称”字段的行并对其进行排序？

My expected output would be:我预期的 output 将是：

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

I was trying to use sort -k4,4 myfile | uniq -D我试图使用sort -k4,4 myfile | uniq -D sort -k4,4 myfile | uniq -D , but I don't find how to tell uniq to work with the 4th column. sort -k4,4 myfile | uniq -D ，但我不知道如何告诉uniq使用第 4 列。 Thanks!谢谢！

Answer 1

You were close.你很亲密。 You need to skip fields preceding the last one.您需要跳过最后一个字段之前的字段。

$ sort -k4 file | uniq -f3 -D
1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

Answer 2

Could you please try following.请您尝试以下操作。

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file

OR in case you want to sort the output try following then.或者，如果您想对 output 进行排序，请尝试以下操作。

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file  |  sort -k4

Answer 3

You may use this awk + sort :您可以使用此awk + sort ：

awk 'FNR==NR{freq[$NF]++; next} freq[$NF] > 1' file{,} | sort -k4

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

如何仅获取 bash 中特定列的重复行

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-04-17 14:26:41

解决方案2
2 2020-04-17 14:22:05

解决方案3
1 2020-04-17 14:23:52

如何仅获取 bash 中特定列的重复行

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-04-17 14:26:41

解决方案2 2 2020-04-17 14:22:05

解决方案3 1 2020-04-17 14:23:52

解决方案1
3 已采纳 2020-04-17 14:26:41

解决方案2
2 2020-04-17 14:22:05

解决方案3
1 2020-04-17 14:23:52