删除忽略特定列的重复项

Question

I want to remove all duplicates from a file but ignoring the first 2 columns, I mean don't comparing those columns.我想从文件中删除所有重复项但忽略前两列，我的意思是不要比较这些列。

This is my example input:这是我的示例输入：

111  06:22  apples, bananas and pears
112  06:28  bananas
113  07:07  apples, bananas and pears
114  07:23  apples and bananas
115  08:01  bananas and pears
116  08:23  pears
117  09:22  apples, bananas and pears
118  12:23  apples and bananas

I want this output:我想要这个 output：

111  06:22  apples, bananas and pears
112  06:28  bananas
114  07:23  apples and bananas
115  08:01  bananas and pears
116  08:23  pears

I've tried this bellow, but it only compares the third column and ignores the rest of the line:我试过这个波纹管，但它只比较第三列并忽略该行的 rest：

awk '!seen[$3]++' sample.txt

Answer 1

Store $0 to a temporary variable, set $1 and $2 to empty, then use newly composed $0 as key:将$0存储到一个临时变量中，将$1和$2设置为空，然后使用新组合的$0作为键：

awk '{ t = $0; $1 = $2 = "" } !seen[$0]++ { print t }' sample.txt

Answer 2

You might use substr string function to get desired part of line for comparison, let file.txt content be您可以使用substr string function来获取所需的行部分进行比较，让file.txt内容为

111  06:22  apples, bananas and pears
112  06:28  bananas
113  07:07  apples, bananas and pears
114  07:23  apples and bananas
115  08:01  bananas and pears
116  08:23  pears
117  09:22  apples, bananas and pears
118  12:23  apples and bananas

then然后

awk '!arr[substr($0,11)]++' file.txt

gives output给出 output

111  06:22  apples, bananas and pears
112  06:28  bananas
114  07:23  apples and bananas
115  08:01  bananas and pears
116  08:23  pears

Explanation: get lines which are unique by getting substring of whole line ( $0 ) starting at 11th character.说明：通过获取从第 11 个字符开始的整行 ( $0 ) 的 substring 来获取唯一的行。

(tested in GNU Awk 5.0.1) （在 GNU Awk 5.0.1 中测试）

删除忽略特定列的重复项

问题描述

2 个解决方案

解决方案1
1 已采纳 2023-01-30 21:48:45

解决方案2
0 2023-01-31 09:46:37

删除忽略特定列的重复项

问题描述

2 个解决方案

解决方案1 1 已采纳 2023-01-30 21:48:45

解决方案2 0 2023-01-31 09:46:37

解决方案1
1 已采纳 2023-01-30 21:48:45

解决方案2
0 2023-01-31 09:46:37