如何跟踪文本文件中的字数

Question

I'm trying to count the occurrences of every word within a text file (case insensitive) and store the words and their counts in a list. 我正在尝试计算文本文件（不区分大小写）中每个单词的出现，并将单词及其计数存储在列表中。

This is my object class for the every word to be stored in the list, 这是我要存储在列表中的每个单词的对象类，

public class WordItem
{
    public string Word { get; set; }
    public int Count { get; set; }
}

and my code function to parse the text files 和我的代码功能来解析文本文件

public List<WordItem> FindWordCount()
{
    //I've successfully parsed the text file into a list
    //of words and stripped punctuation up to this point
    //and stored them in List<string> wordlist.

    List<string> wordlist;
    List<WordEntry> entries = new List<WordEntry>();

    foreach (string word in wordlist)
    {
        WordItem temp = new WordItem();
        temp.Word = word;
        temp.Count = 1;
        entries.Add(temp);
    }
}

How can I edit my word count function to prevent duplicates words in the list, and instead increment the count value everytime I find the word an additional time? 如何编辑单词计数功能以防止列表中的单词重复，而每当我再次发现该单词时就增加计数值？

Answer 1

I would use a Dictionary with a case insensitive string-comparer: 我将使用不区分大小写的字符串比较器的Dictionary ：

public IEnumerable<WordItem> FindWordCount(IEnumerable<string> wordlist)
{
    var wordCount = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
    foreach (string word in wordlist)
    {
        int count = 0;
        bool contained = wordCount.TryGetValue(word, out count);
        count++;
        wordCount[word] = count;
    }
    foreach (var kv in wordCount)
        yield return new WordItem { Word = kv.Key, Count = kv.Value };
}

You can use it in this way: 您可以通过以下方式使用它：

var wordList = new string[] { "A", "a", "b", "C", "a", "b" };
var wordCounts = FindWordCount(wordList).ToList();

Answer 2

There are also pretty single-line solutions: 还有一些漂亮的单行解决方案：

IEnumerable<WordItem> countedList = wordlist.Distinct().Select(word => new WordItem() { Word = word, Count = wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)) });

or, if you prefer a dictionary, in order to be able to search for specific words later: 或者，如果您更喜欢字典，以便以后可以搜索特定单词：

Dictionary<string, int> dictionary = wordlist.Distinct().ToDictionary<string, string, int>(word => word, word => wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)));

Performance is of course a bit less than Tim Smelters solution, because of the Count()-Call (which leads to O(n^2) ) but using C# 6.0 you'd be able to write down the method with a lambda expression for definition instead of a body. 由于Count（）-Call（导致O(n^2) ），因此性能当然比Tim Smelters解决方案要差一些，但是使用C# 6.0您可以使用带有lambda表达式的方法来记录该方法定义而不是主体。

Answer 3

Simple and with your types: 简单且适合您的类型：

public string[] wordList;

public class WordItem
{
    public string Word { get; set; }
    public int Count { get; set; }
}

public IEnumerable<WordItem> FindWordCount()
{
  return from word in wordList
         group word by word.ToLowerInvariant() into g
         select new WordItem { Word = g.Key, Count = g.Count()};
}

如何跟踪文本文件中的字数

问题描述

3 个解决方案

解决方案1
7 已采纳 2015-06-09 13:51:32

解决方案2
0 2015-06-09 14:06:26

解决方案3
0 2015-06-09 14:11:33

如何跟踪文本文件中的字数

问题描述

3 个解决方案

解决方案1 7 已采纳 2015-06-09 13:51:32

解决方案2 0 2015-06-09 14:06:26

解决方案3 0 2015-06-09 14:11:33

解决方案1
7 已采纳 2015-06-09 13:51:32

解决方案2
0 2015-06-09 14:06:26

解决方案3
0 2015-06-09 14:11:33