[英]How to keep track of word count in text file
I'm trying to count the occurrences of every word within a text file (case insensitive) and store the words and their counts in a list. 我正在尝试计算文本文件(不区分大小写)中每个单词的出现,并将单词及其计数存储在列表中。
This is my object class for the every word to be stored in the list, 这是我要存储在列表中的每个单词的对象类,
public class WordItem
{
public string Word { get; set; }
public int Count { get; set; }
}
and my code function to parse the text files 和我的代码功能来解析文本文件
public List<WordItem> FindWordCount()
{
//I've successfully parsed the text file into a list
//of words and stripped punctuation up to this point
//and stored them in List<string> wordlist.
List<string> wordlist;
List<WordEntry> entries = new List<WordEntry>();
foreach (string word in wordlist)
{
WordItem temp = new WordItem();
temp.Word = word;
temp.Count = 1;
entries.Add(temp);
}
}
How can I edit my word count function to prevent duplicates words in the list, and instead increment the count value everytime I find the word an additional time? 如何编辑单词计数功能以防止列表中的单词重复,而每当我再次发现该单词时就增加计数值?
I would use a Dictionary
with a case insensitive string-comparer: 我将使用不区分大小写的字符串比较器的
Dictionary
:
public IEnumerable<WordItem> FindWordCount(IEnumerable<string> wordlist)
{
var wordCount = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
foreach (string word in wordlist)
{
int count = 0;
bool contained = wordCount.TryGetValue(word, out count);
count++;
wordCount[word] = count;
}
foreach (var kv in wordCount)
yield return new WordItem { Word = kv.Key, Count = kv.Value };
}
You can use it in this way: 您可以通过以下方式使用它:
var wordList = new string[] { "A", "a", "b", "C", "a", "b" };
var wordCounts = FindWordCount(wordList).ToList();
There are also pretty single-line solutions: 还有一些漂亮的单行解决方案:
IEnumerable<WordItem> countedList = wordlist.Distinct().Select(word => new WordItem() { Word = word, Count = wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)) });
or, if you prefer a dictionary, in order to be able to search for specific words later: 或者,如果您更喜欢字典,以便以后可以搜索特定单词:
Dictionary<string, int> dictionary = wordlist.Distinct().ToDictionary<string, string, int>(word => word, word => wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)));
Performance is of course a bit less than Tim Smelters solution, because of the Count()-Call (which leads to O(n^2)
) but using C# 6.0
you'd be able to write down the method with a lambda expression for definition instead of a body. 由于Count()-Call(导致
O(n^2)
),因此性能当然比Tim Smelters解决方案要差一些,但是使用C# 6.0
您可以使用带有lambda表达式的方法来记录该方法定义而不是主体。
Simple and with your types: 简单且适合您的类型:
public string[] wordList;
public class WordItem
{
public string Word { get; set; }
public int Count { get; set; }
}
public IEnumerable<WordItem> FindWordCount()
{
return from word in wordList
group word by word.ToLowerInvariant() into g
select new WordItem { Word = g.Key, Count = g.Count()};
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.