从字符串列表中获取唯一项

Question

I have a very simple text file parsing app which searches for an email address and if found adds to a list. 我有一个非常简单的文本文件解析应用程序，该应用程序搜索电子邮件地址，如果找到，则会添加到列表中。

Currently there are duplicate email addresses in the list and I'm looking for a quick way of trimming the list down to only contain distinct values - without iterating over them one by one :) 当前列表中有重复的电子邮件地址，我正在寻找一种将列表缩小为仅包含不同值的快速方法-而不是一个一个地遍历它们：)

Here's code - 这是代码-

var emailLines = new List<string>();
using (var stream = new StreamReader(@"C:\textFileName.txt"))
{
    while (!stream.EndOfStream)
    {
        var currentLine = stream.ReadLine();

        if (!string.IsNullOrEmpty(currentLine) && currentLine.StartsWith("Email: "))
        {
            emailLines.Add(currentLine);
        }
    }
}

Answer 1

If you just need unique items, you could use add your items to a HashSet instead of a List . 如果您只需要唯一的项目，则可以使用将项目添加到HashSet而不是List 。 Note that HashSet s have no implied order. 请注意， HashSet没有隐含顺序。 If you need an ordered set, you could use SortedSet instead. 如果需要有序集，则可以改用SortedSet 。

var emailLines = new HashSet<string>();

Then there'd be no duplicates. 这样就不会有重复。

To remove duplicates from a List , you could use IEnumerable.Distinct() : 要从List删除重复项，可以使用IEnumerable.Distinct() ：

IEnumerable<string> distinctEmails = emailLines.Distinct();

Answer 2

Try the following 尝试以下

var emailLines = File.ReadAllLines(@"c:\textFileName.txt")
  .Where(x => !String.IsNullOrEmpty(x) && x.StartsWith("Email: "))
  .Distinct()
  .ToList();

The downside to this approach is that it reads all of the lines in the file into a string[] . 这种方法的缺点是它将文件中的所有行读入string[] 。 This happens immediately and for large files will create a correspondingly large array. 这会立即发生，并且对于大文件将创建相应的大数组。 It's possible to get back the lazy reading of lines by using a simple iterator. 通过使用一个简单的迭代器，可以找回行的惰性读取。

public static IEnumerable<string> ReadAllLinesLazy(string path) { 
  using ( var stream = new StreamReader(path) ) {
    while (!stream.EndOfStream) {
      yield return stream.ReadLine();
    }
  }
}

The File.ReadAllLines call above can then just be replaced with a call to this function 然后可以将上面的File.ReadAllLines调用替换为对该函数的调用

Answer 3

IEnumerable/Linq goodness (great for large files - only the matching lines are ever kept in memory): IEnumerable / Linq的优点（适用于大型文件，只有匹配的行才会保留在内存中）：

// using System.Linq;

var emailLines = ReadFileLines(@"C:\textFileName.txt")
    .Where(line => currentLine.StartsWith("Email: "))
    .Distinct()
    .ToList();

public IEnumerable<string> ReadFileLines(string fileName)
{
    using (var stream = new StreamReader(fileName))
    {
        while (!stream.EndOfStream)
        {
            yield return stream.ReadLine();
        }
    }
}

从字符串列表中获取唯一项

问题描述

3 个解决方案

解决方案1
7 2010-09-24 03:58:28

解决方案2
3 已采纳 2010-09-24 04:02:18

解决方案3
1 2010-09-24 04:05:16

从字符串列表中获取唯一项

问题描述

3 个解决方案

解决方案1 7 2010-09-24 03:58:28

解决方案2 3 已采纳 2010-09-24 04:02:18

解决方案3 1 2010-09-24 04:05:16

解决方案1
7 2010-09-24 03:58:28

解决方案2
3 已采纳 2010-09-24 04:02:18

解决方案3
1 2010-09-24 04:05:16