[英]How to generate list of unique lines in text file using a Linux shell script?
Suppose I have a file that contain a bunch of lines, some repeating: 假设我有一个包含一堆行的文件,有些重复:
line1
line1
line1
line2
line3
line3
line3
What linux command(s) should I use to generate a list of unique lines: 我应该使用什么linux命令来生成唯一行的列表:
line1
line2
line3
Does this change if the file is unsorted, ie repeating lines may not be in blocks? 如果文件未排序,这是否会改变,即重复行可能不在块中?
If you don't mind the output being sorted, use 如果您不介意输出被排序,请使用
sort -u
This sorts and removes duplicates 这会对重复项进行排序和删除
cat
to output the contents, piped to sort
to sort them, piped to uniq
to print out the unique values: cat
输出内容,管道sort
以对它们进行排序,管道输出到uniq
以打印出唯一值:
cat test1.txt | sort | uniq
you don't need to do the sort
part if the file contents are already sorted. 如果文件内容已经排序,则不需要执行sort
部分。
Create a new sort file with unique lines : 使用唯一行创建新的排序文件:
sort -u file >> unique_file
Create a new file with uniques lines (unsorted) : 使用唯一线条(未排序)创建新文件:
cat file | uniq >> unique_file
If we do not care about the order , then the best solution is actually: 如果我们不关心订单 ,那么最好的解决方案实际上是:
sort -u file
If we also want to ignore the case letter , we can use it (as a result all letters will be converted to uppercase): 如果我们也想忽略大小写字母 ,我们可以使用它(因此所有字母都将转换为大写):
sort -fu file
It would seem that even a better idea would be to use the command: 似乎更好的想法是使用命令:
uniq file
and if we also want to ignore the case letter (as a result the first row of duplicates is returned, without any change in case): 如果我们也想忽略大小写字母 (结果返回第一行重复项,大小写没有任何变化):
uniq -i file
However, in this case, may be returned a completely different result, than in case when we use the sort
command, because uniq
command does not detect repeated lines unless they are adjacent . 但是,在这种情况下,可能会返回与使用 sort
命令 时完全不同的结果 ,因为uniq
命令不检测重复的行,除非它们是相邻的 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.