简体   繁体   English

如何在C#中使用LINQ返回不同单词的列表?

[英]How do I return a list of Distinct Words using LINQ in C#?

The goal is to sort through a text (ie a speech) and output a list of the distinct words in the speech to a textbox. 目的是对文本(即语音)进行排序,并将语音中不同单词的列表输出到文本框。 I have read through a lot of tips on the boards and played around a lot but at this point am at that point where I am more confused then when I started. 我已经阅读了很多棋盘上的技巧,并且玩了很多,但在这一点上,我比开始时更加困惑。 Here is my code 这是我的代码

   private void GenerateList(string[] wordlist)
    {
       List<string> wordList = new List<string>();

        for (int i = 0; i < wordlist.Length; i++)
        {
            wordList.Add(wordlist[i]);
        }

        var uniqueStr = from item in wordList.Distinct().ToList()
                        orderby item
                        select item;


        for (int i = 0; i < uniqueStr.Count(); i++ )
        {
            txtOutput.Text = uniqueStr.ElementAt(i) + "\n";
        }

    }

At this point I am getting a return of one word. 在这一点上,我得到一个字的回报。 For the text I am using (the gettysburg address) it is the word "year" and it is the only instance of that word in the text. 对于我正在使用的文本(葛底斯堡地址),它是单词“ year”,并且是该单词在文本中的唯一实例。

I am passing the function each individual word loaded into a string array that is then put into a list (which may be redundant?). 我正在将每个单词加载到字符串数组中,然后将其放入列表中传递函数(这可能是多余的吗?)。

I hope this does what you need in a simple and efficient manner (using .Dump() from LINQPad) 我希望这能以简单有效的方式满足您的需求(使用LINQPad的.Dump())

void Main()
{
    // can be any IEnumerable<string> including string[]
    var words = new List<string>{"one", "two", "four", "three", "four", "a", "z"};

    words.ToDistinctList().Dump();

    // you would use txtOutput.Text = words.ToDistinctList()
}

static class StringHelpers
{
    public static string ToDistinctList(this IEnumerable<string> words)
    {
        return string.Join("\n", new SortedSet<string>(words));
    }
}

A few tips regarding your question: 有关您的问题的一些技巧:

  • There is no reason to turn the array into list, because LINQ extension methods are defined on IEnumerable<T> , which is implemented by both the array and the list 没有理由将数组转换为列表,因为LINQ扩展方法是在IEnumerable<T>上定义的,该方法由数组和列表共同实现
  • Make sure that all letters are in the same case - use ToLower, for instance 确保所有字母都大小写相同-例如,使用ToLower
  • You are overwriting txtOutput.Text in every iteration. 您将在每次迭代中覆盖txtOutput.Text。 Instead of setting the new value, append new part to the existing value 无需设置新值,而是将新零件附加到现有值

Here is the simple piece of code which produces the output you wanted: 这是产生所需输出的简单代码:

IEnumerable<string> distinct =
    wordList
    .Select(word => word.ToLower())
    .Distinct()
    .OrderBy(word => word);

txtOutput.Text = string.Join("\n", distinct.ToArray());

On a related note, here is a very simple LINQ expression which returns distinct words from a text, where the whole text is specified as one string: 在相关说明中,这是一个非常简单的LINQ表达式,该表达式返回文本中不同的词,其中整个文本被指定为一个字符串:

public static IEnumerable<string> SplitIntoWords(this string text)
{

    string pattern = @"\b[\p{L}]+\b";

    return
        Regex.Matches(text, pattern)
            .Cast<Match>()                          // Extract matches
            .Select(match => match.Value.ToLower()) // Change to same case
            .Distinct();                            // Remove duplicates

}

You can find more variations of regex pattern for the same problem here: Regex and LINQ Query to Split Text into Distinct Words 您可以在这里找到针对同一问题的正则表达式模式的更多变化:正则表达式和LINQ查询,可将文本拆分为不同的词

Here's how I'd simplify your code, as well as achieve what you want to achieve. 这就是我简化您的代码以及实现您想要实现的目标的方式。

private void GenerateList(string[] wordlist)
{
   List<string> wordList = wordlist.ToList(); // initialize the list passing in the array


    var uniqueStr = from item in wordList.Distinct().ToList()
                    orderby item
                    select item;


    txtOutput.Text = String.Join("\n", uniqueStr.ToArray());
}

You can use the fact that the StringBuilder class has a fluent interface along with LINQ to simplify this greatly. 您可以使用StringBuilder与LINQ一起具有流畅接口的事实来大大简化此过程。

First, you can create the StringBuilder and concatenate all of the words into the same instance like so: 首先,您可以创建StringBuilder并将所有单词连接到同一实例中,如下所示:

// The builder.
var builder = new StringBuilder();

// A copy of the builder *reference*.
var builderCopy = builder;

// Get the distinct list, order by the string.
builder = wordList
    // Get the distinct elements.
    .Distinct()
    // Order the words.
    .OrderBy(w => w).
    // Append the builder.
    Select(w => builderCopy.AppendLine(word)).
    // Get the last or default element, this will
    // cycle through all of the elements.
    LastOrDefault();

// If the builder is not null, then assign to the output, otherwise,
// assign null.
txtOutput.Text = builder == null ? null : builder.ToString();

Note, you don't have to actually materialize the list, as wordList is already a materialized list, it's an array (and as a side note, typed arrays in C# implement the IList<T> interface ). 注意,您不必实际实现列表,因为wordList 已经是一个实现列表,它是一个数组(另外,C#中的类型化数组实现了IList<T>接口 )。

The AppendLine method (and most of the methods on StringBuilder ) return the instance of the StringBuilder that the operation was performed on, which is why the LastOrDefault method call works; AppendLine方法 (以及StringBuilder上的大多数方法)返回在其上执行操作的StringBuilder的实例,这就是为什么LastOrDefault方法调用起作用的原因; simply call the operation and return the result (each item returned will be the same reference). 只需调用该操作并返回结果(返回的每个项目都是相同的引用)即可。

The builderCopy variable is used to avoid access to a modified closure (it never hurts to be safe). builderCopy变量用于避免访问已修改的闭包 (安全起见,它永远不会builderCopy )。

The null check at the end is for the case where wordList doesn't contain any elements. 最后的null检查是针对wordList不包含任何元素的情况。 In this case, the call to LastOrDefault will return null. 在这种情况下,对LastOrDefault的调用将返回null。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM