简体   繁体   中英

Is this the best way to create a frequency table using LINQ?

I want to write a function that reads a file and counts the number of times each word occurs. Assuming the file-reading is handled and produces a list of strings representing each line in the file, I need a function to count the occurrence of each word. Firstly, is using a Dictionary<string,int> the best approach? The key is the word, and the value is the number of occurrences of that word.

I wrote this function which iterates through each line and each word in a line and builds up a dictionary:

static IDictionary<string, int> CountWords(IEnumerable<string> lines)
var dict = new Dictionary<string, int>();
foreach (string line in lines)
{
    string[] words = line.Split(' ');
    foreach (string word in words)
    {
        if (dict.ContainsKey(word))
            dict[word]++;
        else
            dict.Add(word, 1);
    }
}

However, I would like to somehow write this function.. functionally, using LINQ (because LINQ is fun and I'm trying to improve my functional programming skills :D) I managed to come up with this expresion, but I'm not sure whether it's the best way to do it functionally:

static IDictionary<string, int> CountWords2(IEnumerable<string> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .Aggregate(new Dictionary<string, int>(),
            (dict, word) =>
            {
                if (dict.ContainsKey(word))
                    dict[word]++;
                else
                    dict.Add(word, 1);
                return dict;
            });
}

So while I have two working solutions, I am also interested in learning what the best approach is to this problem. Anyone with insight on LINQ and FP?

As Tim Robinson wrote you could use GroupBy with ToDictionary like this

    public static Dictionary<string, int> CountWords3(IEnumerable<string> strings)
    {
        return strings.SelectMany(s => s.Split(' ')).GroupBy(w=>w).ToDictionary(g => g.Key, g => g.Count());
    }

Take a look at GroupBy instead of Aggregate -- it will give you a set of IGrouping<string, string> objects. You'll be able to retrieve the count of each word by calling .Count() on each grouping.

The following should do the job.

static IDictionary<String, Int32> CountWords(IEnumerable<String> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .GroupBy(word => word)
        .ToDictionary(group => group.Key, group => group.Count());
}

if you want to use linq (and not use the extension methods used by linq firectly) you can write:

var groups = from line in lines
             from s in line.Split(new []{"\t", " "},StringSplitOptions.RemoveEmptyEntries) 
             group s by s into g
             select g;
var dic = groups.ToDictionary(g => g.Key,g=>g.Count());

your current implementation won't split on tab and might include the "word" string.Empty so I've changed the split in accordance to what I think your intentions are.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM