[英]Basic grep/sed/awk script to find duplicates
I'm starting out with regular expressions and grep and I want to find out how to do this. 我从正则表达式和grep开始,我想知道如何做到这一点。 I have this list: 我有这个清单:
1. 12493 6530
2. 12475 5462
3. 12441 5450
4. 12413 5258
5. 12478 4454
6. 12416 3859
7. 12480 3761
8. 12390 3746
9. 12487 3741
10. 12476 3557
...
And I want to get the contents of the middle column only (so NF==2 in awk?). 而且我只想获取中间列的内容(所以awk中的NF == 2?)。 The delimiter here is a space. 这里的分隔符是一个空格。
I then want to find which numbers are there more than once (duplicates). 然后,我想查找哪些数字不止一次(重复)。 How would I go about doing that? 我将如何去做? Thank you, I'm a beginner. 谢谢,我是初学者。
awk '{count[$2]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file
But you don't have duplicate numbers in the 2nd column. 但是第二列中没有重复的数字。
awk
is $2
awk
的第二列是$2
count[$2]++
increment an array value with the treated number as key count[$2]++
以已处理的数字作为键递增数组值 END
block is executed @the end, and we test each array values to find those having +1 END
块在END
执行,我们测试每个数组的值以找到具有+1的值 And with a better concision (credits for jthill ) 并且具有更好的简洁性( jthill的积分)
awk '++count[$2]==2{print $2}' file
Using perl: 使用perl:
perl -anE '$h{$F[1]}++; END{ say for grep $h{$_} > 1, keys %h }'
Iterate the lines and build a hash ( %h
/ $h{...}
) with the count ( ++
) of the second column values ( $F[1]
), and after that ( END{ ... }
) say
all hash key
s with count ( $h{$_}
) which is > 1
. 对行进行迭代,并使用第二个列值( $F[1]
)的计数( ++
)构建一个哈希( %h
/ $h{...}
),然后再构建一个( END{ ... }
) say
计数( $h{$_}
) > 1
所有哈希key
s。
With the data stored in test, 数据存储在测试中
Using a combination of awk, uniq and grep commands 结合使用awk,uniq和grep命令
cat test | awk -v x=2 '{print $x}' | sort | uniq -c | sed '/^1 /d' | awk -v x=2 '{print $x}'
Explanation: 说明:
awk -v x=2 '{print $x}'
selects 2nd column 选择第二列
uniq -c
counts the appearance of each number 计算每个数字的出现
sed '/^1 /d'
deletes all the entries with only one appearance 删除仅出现一次的所有条目
awk -v x=2 '{print $x}'
removes the number count with awk again 再次用awk删除数字计数
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.