[英]Extracting unique and non-unique strings to separate output files
I am have trouble trying to extract only lines that are not duplicated and only lines that are only duplicates from a test file. 我在尝试从测试文件中仅提取不重复的行和仅重复的行时遇到了麻烦。 The input file contains both duplicates and non-duplicate lines.
输入文件包含重复行和非重复行。
I have created a logging function and I can extract all unique lines from it to a separate file but that includes lines that are duplicates and lines that aren't, I need to separate them. 我创建了一个日志记录功能,可以将所有唯一的行从其中提取到一个单独的文件中,但是其中包括重复的行和不重复的行,我需要将它们分开。
This is what I have so far; 这是我到目前为止所拥有的;
static void Dupes(string path1, string path2)
{
string log = log.txt;
var sr = new StreamReader(File.OpenRead(path1));
var sw = new StreamWriter(File.OpenWrite(path2));
var lines = new HashSet<int>();
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
int hc = line.GetHashCode();
if (lines.Contains(hc))
continue;
lines.Add(hc);
sw.WriteLine(line);
}
sw.Close();
}
Ideally this would be in two functions, so they can be called to perform different actions on the output contents. 理想情况下,这将是两个函数,因此可以调用它们对输出内容执行不同的操作。
use LINQ to Group items, then check the count: 使用LINQ将项目分组,然后检查计数:
var lines = File.ReadAllLines(path1);
var distincts = lines.GroupBy(l => l)
.Where(l => l.Count() == 1)
.Select(l => l.Key)
.ToList();
var dupes = lines.Except(distincts).ToList();
It's worth noting that Except
doesn't return duplicates - something I just learned. 值得注意的是,
Except
不会返回重复项-这是我刚刚学到的。 So no need to call Distinct
afterwards. 因此,以后无需致电
Distinct
。
You can do as follow 您可以按照以下步骤进行
var lines = File.ReadAllLines(path1);
var countLines = lines.Select(d => new
{
Line = d,
Count = lines.Count(f => f == d),
});
var UniqueLines = countLines.Where(d => d.Count == 1).Select(d => d.Line);
var NotUniqueLines = countLines.Where(d => d.Count > 1).Select(d => d.Line);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.