[英]How do I return a list of Distinct Words using LINQ in C#?
The goal is to sort through a text (ie a speech) and output a list of the distinct words in the speech to a textbox. 目的是对文本(即语音)进行排序,并将语音中不同单词的列表输出到文本框。 I have read through a lot of tips on the boards and played around a lot but at this point am at that point where I am more confused then when I started.
我已经阅读了很多棋盘上的技巧,并且玩了很多,但在这一点上,我比开始时更加困惑。 Here is my code
这是我的代码
private void GenerateList(string[] wordlist)
{
List<string> wordList = new List<string>();
for (int i = 0; i < wordlist.Length; i++)
{
wordList.Add(wordlist[i]);
}
var uniqueStr = from item in wordList.Distinct().ToList()
orderby item
select item;
for (int i = 0; i < uniqueStr.Count(); i++ )
{
txtOutput.Text = uniqueStr.ElementAt(i) + "\n";
}
}
At this point I am getting a return of one word. 在这一点上,我得到一个字的回报。 For the text I am using (the gettysburg address) it is the word "year" and it is the only instance of that word in the text.
对于我正在使用的文本(葛底斯堡地址),它是单词“ year”,并且是该单词在文本中的唯一实例。
I am passing the function each individual word loaded into a string array that is then put into a list (which may be redundant?). 我正在将每个单词加载到字符串数组中,然后将其放入列表中传递函数(这可能是多余的吗?)。
I hope this does what you need in a simple and efficient manner (using .Dump() from LINQPad) 我希望这能以简单有效的方式满足您的需求(使用LINQPad的.Dump())
void Main()
{
// can be any IEnumerable<string> including string[]
var words = new List<string>{"one", "two", "four", "three", "four", "a", "z"};
words.ToDistinctList().Dump();
// you would use txtOutput.Text = words.ToDistinctList()
}
static class StringHelpers
{
public static string ToDistinctList(this IEnumerable<string> words)
{
return string.Join("\n", new SortedSet<string>(words));
}
}
A few tips regarding your question: 有关您的问题的一些技巧:
IEnumerable<T>
, which is implemented by both the array and the list IEnumerable<T>
上定义的,该方法由数组和列表共同实现 Here is the simple piece of code which produces the output you wanted: 这是产生所需输出的简单代码:
IEnumerable<string> distinct =
wordList
.Select(word => word.ToLower())
.Distinct()
.OrderBy(word => word);
txtOutput.Text = string.Join("\n", distinct.ToArray());
On a related note, here is a very simple LINQ expression which returns distinct words from a text, where the whole text is specified as one string: 在相关说明中,这是一个非常简单的LINQ表达式,该表达式返回文本中不同的词,其中整个文本被指定为一个字符串:
public static IEnumerable<string> SplitIntoWords(this string text)
{
string pattern = @"\b[\p{L}]+\b";
return
Regex.Matches(text, pattern)
.Cast<Match>() // Extract matches
.Select(match => match.Value.ToLower()) // Change to same case
.Distinct(); // Remove duplicates
}
You can find more variations of regex pattern for the same problem here: Regex and LINQ Query to Split Text into Distinct Words 您可以在这里找到针对同一问题的正则表达式模式的更多变化:正则表达式和LINQ查询,可将文本拆分为不同的词
Here's how I'd simplify your code, as well as achieve what you want to achieve. 这就是我简化您的代码以及实现您想要实现的目标的方式。
private void GenerateList(string[] wordlist)
{
List<string> wordList = wordlist.ToList(); // initialize the list passing in the array
var uniqueStr = from item in wordList.Distinct().ToList()
orderby item
select item;
txtOutput.Text = String.Join("\n", uniqueStr.ToArray());
}
You can use the fact that the StringBuilder
class has a fluent interface along with LINQ to simplify this greatly. 您可以使用
StringBuilder
类与LINQ一起具有流畅接口的事实来大大简化此过程。
First, you can create the StringBuilder
and concatenate all of the words into the same instance like so: 首先,您可以创建
StringBuilder
并将所有单词连接到同一实例中,如下所示:
// The builder.
var builder = new StringBuilder();
// A copy of the builder *reference*.
var builderCopy = builder;
// Get the distinct list, order by the string.
builder = wordList
// Get the distinct elements.
.Distinct()
// Order the words.
.OrderBy(w => w).
// Append the builder.
Select(w => builderCopy.AppendLine(word)).
// Get the last or default element, this will
// cycle through all of the elements.
LastOrDefault();
// If the builder is not null, then assign to the output, otherwise,
// assign null.
txtOutput.Text = builder == null ? null : builder.ToString();
Note, you don't have to actually materialize the list, as wordList
is already a materialized list, it's an array (and as a side note, typed arrays in C# implement the IList<T>
interface ). 注意,您不必实际实现列表,因为
wordList
已经是一个实现列表,它是一个数组(另外,C#中的类型化数组实现了IList<T>
接口 )。
The AppendLine
method (and most of the methods on StringBuilder
) return the instance of the StringBuilder
that the operation was performed on, which is why the LastOrDefault
method call works; AppendLine
方法 (以及StringBuilder
上的大多数方法)返回在其上执行操作的StringBuilder
的实例,这就是为什么LastOrDefault
方法调用起作用的原因; simply call the operation and return the result (each item returned will be the same reference). 只需调用该操作并返回结果(返回的每个项目都是相同的引用)即可。
The builderCopy
variable is used to avoid access to a modified closure (it never hurts to be safe). builderCopy
变量用于避免访问已修改的闭包 (安全起见,它永远不会builderCopy
)。
The null check at the end is for the case where wordList
doesn't contain any elements. 最后的null检查是针对
wordList
不包含任何元素的情况。 In this case, the call to LastOrDefault
will return null. 在这种情况下,对
LastOrDefault
的调用将返回null。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.