[英]Removing both duplicates (not just the repeated) from a text file?
By this I mean, erase all rows in a text file that are repeated, NOT just the duplicates. 我的意思是,擦除文本文件中所有重复的行,而不仅仅是重复行。 I mean both the row that is duplicated and the duplicated row.
我的意思是重复的行和重复的行。 This would leave me only with the list of rows that weren't repeated.
这只会让我留下未重复的行列表。 Perhaps a regular expression could do this in notepad++?
也许正则表达式可以在notepad ++中做到这一点? But which one?
但是哪一个呢? Any other methods?
还有其他方法吗?
If you're on a unix-like system, you can use the uniq command. 如果您使用的是类似Unix的系统,则可以使用uniq命令。
ezra@ubuntu:~$ cat test.file
ezra
ezra
john
user
ezra@ubuntu:~$ uniq -u test.file
john
user
Note, that the similar rows be adjacent. 注意,相似的行是相邻的。 You'll have to sort the file first if they're not.
如果不是,则必须首先对文件排序。
ezra@ubuntu:~$ cat test.file
ezra
john
ezra
user
ezra@ubuntu:~$ uniq -u test.file
ezra
john
ezra
user
ezra@ubuntu:~$ sort test.file | uniq -u
john
user
If you have acess to a regex that supports PCRE style, this is straight forward: 如果您需要支持PCRE样式的正则表达式,这很简单:
s/(?:^|(?<=\\n))(.*)\\n(?:\\1(?:\\n|$))+//g
(?:^|(?<=\n)) # Behind us is beginning of string or newline
(.*)\n # Capture group 1: all characters up until next newline
(?: # Start non-capture group
\1 # backreference to what was captured in group 1
(?:\n|$) # a newline or end of string
)+ # End non-capture group, do this 1 or more times
Context is a single string 上下文是单个字符串
use strict; use warnings;
my $str =
'hello
this is
this is
this is
that is';
$str =~ s/
(?:^|(?<=\n))
(.*)\n
(?:
\1
(?:\n|$)
)+
//xg;
print "'$str'\n";
__END__
output: 输出:
'hello
that is'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.