如何在删除重复项的同时删除两条匹配的行

Question

I have a large text file containing a list of emails called "main", and I have sent mails to some of them. 我有一个很大的文本文件，其中包含一个称为“ main”的电子邮件列表，并且已经向其中一些发送了邮件。 I have a list of 'sent' emails. 我有一个“已发送”电子邮件列表。 Now, I want to remove the 'sent' emails from the list "main". 现在，我想从“主要”列表中删除“已发送”电子邮件。

In other words, I want to remove both the matching raw from the text file while removing duplicates. 换句话说，我想从文本文件中删除两个匹配的原始文件，同时删除重复项。 Example: 例：

I have: 我有：

email@email.com
test@test.com
email@email.com

I want: 我想要：

test@test.com

Is there any easier way to achieve this? 有没有更简单的方法来实现这一目标？ Please suggest a tool or method to do this, but please consider the text file is larger than 10MB. 请建议执行此操作的工具或方法，但请考虑该文本文件大于10MB。

Answer 1

在终端：

cat test| sort | uniq -c | awk -F" " '{if($1==1) print $2}'

Answer 2

I use cygwin a lot for such tasks, as the unix command line is incredibly powerful. 我用cygwin来完成这些任务，因为unix命令行功能非常强大。

Here's how to achieve what you want: 这是实现您想要的方法：

cat main.txt | sort -u | grep -Fvxf sent.txt

sort -u will remove duplicates (by sorting the main.txt file first), and grep will take care of removing the unwanted addresses. sort -u将删除重复项（首先对main.txt文件进行排序），而grep将负责删除不需要的地址。

Here's what the grep options mean: 这是grep选项的含义：

-F plain text search -F纯文本搜索
-v invert results -v反转结果
-x will force the whole line to match the pattern -x将强制整行匹配模式
-f read patterns from the specified file -f从指定文件读取模式

Oh, and if your files are in the Windows format ( CR LF newlines) you'll rather have to do this: 哦，如果您的文件是Windows格式（ CR LF换行符），则您需要这样做：

cat main.txt | dos2unix | sort -u | grep -Fvxf <(cat sent.txt | dos2unix)

Just like with the Windows command line, you can simply add: 就像Windows命令行一样，您只需添加：

> output.txt

at the end of the command line to redirect the output to a text file. 在命令行末尾将输出重定向到文本文件。

如何在删除重复项的同时删除两条匹配的行

问题描述

2 个解决方案

解决方案1
0 2014-09-20 23:06:30

解决方案2
0 2014-09-20 23:08:29

如何在删除重复项的同时删除两条匹配的行

问题描述

2 个解决方案

解决方案1 0 2014-09-20 23:06:30

解决方案2 0 2014-09-20 23:08:29

解决方案1
0 2014-09-20 23:06:30

解决方案2
0 2014-09-20 23:08:29