简体   繁体   中英

Collection of strings to dictionary

Given an ordered collection of strings:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };

Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:

IDictionary<string,int> stringToNumOccurrences = ...;

Preferably do this in a single pass over the strings collection...

var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());

Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.

Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.

Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)

EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:

public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
    var dictionary = new Dictionary<string, int>();

    using (var iterator = strings.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            // No entries
            return dictionary;
        }
        string current = iterator.Current;
        int currentCount = 1;
        while (iterator.MoveNext())
        {
            string next = iterator.Current;
            if (next == current)
            {
                currentCount++;
            }
            else
            {
                dictionary[current] = currentCount;
                current = next;
                currentCount = 1;
            }
        }
        // Write out the trailing result
        dictionary[current] = currentCount;
    }
    return dictionary;
}

This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.

Actually you could do this with LINQ using Aggregate , but it would be pretty nasty.

The standard LINQ way is this:

stringToNumOccurrences = strings.GroupBy(s => s)
                                .ToDictionary(g => g.Key, g => g.Count());

If this is actual production code, I'd go with Timwi's response .

If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:

  1. Dictionary<TKey, TValue> has a ContainsKey method.
  2. The IDictionary<TKey, TValue> interface's this[TKey] property is settable; ie, you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1 ).

From those clues I think you should be able to figure out how to do it "by hand."

If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
    if (stringToNumOccurrences.ContainsKey(str))
        stringToNumOccurrences[str]++;
    else
        stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;

This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.

I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?

static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
    var result = new Dictionary<string, int>();
    string prev = null;
    int count = 0;
    foreach (var s in source)
    {
        if (prev != s && count > 0)
        {
            result.Add(prev, count);
            count = 0;
        }
        prev = s;
        ++count;
    }
    if (count > 0)
    { 
        result.Add(prev, count);
    }
    return result;
}

Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM