从文本文件中删除两个重复项（而不仅仅是重复项）？

Question

By this I mean, erase all rows in a text file that are repeated, NOT just the duplicates. 我的意思是，擦除文本文件中所有重复的行，而不仅仅是重复行。 I mean both the row that is duplicated and the duplicated row. 我的意思是重复的行和重复的行。 This would leave me only with the list of rows that weren't repeated. 这只会让我留下未重复的行列表。 Perhaps a regular expression could do this in notepad++? 也许正则表达式可以在notepad ++中做到这一点？ But which one? 但是哪一个呢？ Any other methods? 还有其他方法吗？

Answer 1

If you're on a unix-like system, you can use the uniq command. 如果您使用的是类似Unix的系统，则可以使用uniq命令。

ezra@ubuntu:~$ cat test.file
ezra
ezra
john
user
ezra@ubuntu:~$ uniq -u test.file
john
user

Note, that the similar rows be adjacent. 注意，相似的行是相邻的。 You'll have to sort the file first if they're not. 如果不是，则必须首先对文件排序。

ezra@ubuntu:~$ cat test.file
ezra
john
ezra
user
ezra@ubuntu:~$ uniq -u test.file
ezra
john
ezra
user
ezra@ubuntu:~$ sort test.file | uniq -u
john
user

Answer 2

If you have acess to a regex that supports PCRE style, this is straight forward: 如果您需要支持PCRE样式的正则表达式，这很简单：

s/(?:^|(?<=\\n))(.*)\\n(?:\\1(?:\\n|$))+//g

(?:^|(?<=\n))     # Behind us is beginning of string or newline
(.*)\n            # Capture group 1: all characters up until next newline
(?:               # Start non-capture group
    \1                # backreference to what was captured in group 1
    (?:\n|$)          # a newline or end of string
)+                # End non-capture group, do this 1 or more times

Context is a single string 上下文是单个字符串

use strict; use warnings;

my $str = 
'hello
this is
this is
this is
that is';

$str =~ s/
          (?:^|(?<=\n))
          (.*)\n
          (?:
              \1
              (?:\n|$)
          )+
  //xg;

print "'$str'\n";

__END__

output: 输出：

'hello
that is'

从文本文件中删除两个重复项（而不仅仅是重复项）？

问题描述

2 个解决方案

解决方案1
2 2011-03-11 21:47:47

解决方案2
1

从文本文件中删除两个重复项（而不仅仅是重复项）？

问题描述

2 个解决方案

解决方案1 2 2011-03-11 21:47:47

解决方案2 1

解决方案1
2 2011-03-11 21:47:47

解决方案2
1