[英]Getting unique items from a list of strings
I have a very simple text file parsing app which searches for an email address and if found adds to a list. 我有一个非常简单的文本文件解析应用程序,该应用程序搜索电子邮件地址,如果找到,则会添加到列表中。
Currently there are duplicate email addresses in the list and I'm looking for a quick way of trimming the list down to only contain distinct values - without iterating over them one by one :) 当前列表中有重复的电子邮件地址,我正在寻找一种将列表缩小为仅包含不同值的快速方法-而不是一个一个地遍历它们:)
Here's code - 这是代码-
var emailLines = new List<string>();
using (var stream = new StreamReader(@"C:\textFileName.txt"))
{
while (!stream.EndOfStream)
{
var currentLine = stream.ReadLine();
if (!string.IsNullOrEmpty(currentLine) && currentLine.StartsWith("Email: "))
{
emailLines.Add(currentLine);
}
}
}
If you just need unique items, you could use add your items to a HashSet
instead of a List
. 如果您只需要唯一的项目,则可以使用将项目添加到
HashSet
而不是List
。 Note that HashSet
s have no implied order. 请注意,
HashSet
没有隐含顺序。 If you need an ordered set, you could use SortedSet
instead. 如果需要有序集,则可以改用
SortedSet
。
var emailLines = new HashSet<string>();
Then there'd be no duplicates. 这样就不会有重复。
To remove duplicates from a List
, you could use IEnumerable.Distinct()
: 要从
List
删除重复项,可以使用IEnumerable.Distinct()
:
IEnumerable<string> distinctEmails = emailLines.Distinct();
Try the following 尝试以下
var emailLines = File.ReadAllLines(@"c:\textFileName.txt")
.Where(x => !String.IsNullOrEmpty(x) && x.StartsWith("Email: "))
.Distinct()
.ToList();
The downside to this approach is that it reads all of the lines in the file into a string[]
. 这种方法的缺点是它将文件中的所有行读入
string[]
。 This happens immediately and for large files will create a correspondingly large array. 这会立即发生,并且对于大文件将创建相应的大数组。 It's possible to get back the lazy reading of lines by using a simple iterator.
通过使用一个简单的迭代器,可以找回行的惰性读取。
public static IEnumerable<string> ReadAllLinesLazy(string path) {
using ( var stream = new StreamReader(path) ) {
while (!stream.EndOfStream) {
yield return stream.ReadLine();
}
}
}
The File.ReadAllLines
call above can then just be replaced with a call to this function 然后可以将上面的
File.ReadAllLines
调用替换为对该函数的调用
IEnumerable/Linq goodness (great for large files - only the matching lines are ever kept in memory): IEnumerable / Linq的优点(适用于大型文件,只有匹配的行才会保留在内存中):
// using System.Linq;
var emailLines = ReadFileLines(@"C:\textFileName.txt")
.Where(line => currentLine.StartsWith("Email: "))
.Distinct()
.ToList();
public IEnumerable<string> ReadFileLines(string fileName)
{
using (var stream = new StreamReader(fileName))
{
while (!stream.EndOfStream)
{
yield return stream.ReadLine();
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.