简体   繁体   中英

Getting a count of unique strings from a List<string[]> into a dictionary

I want to input a List<string[]> and

The output is a dictionary where the keys are unique strings used for an index and the values is an array of floats with each position in the array representing the count of the key for a string[] in the List<string[]>

So far here is what I attempted

static class CT
{
    //Counts all terms in array
    public static Dictionary<string, float[]> Termfreq(List<string[]> text)
    {
        List<string> unique = new List<string>();

        foreach (string[] s in text)
        {
            List<string> groups = s.Distinct().ToList();
            unique.AddRange(groups);
        }

        string[] index = unique.Distinct().ToArray();

        Dictionary<string, float[]> countset = new Dictionary<string, float[]>();


         return countset;
    }

}



 static void Main()
    {
        /* local variable definition */


        List<string[]> doc = new List<string[]>();
        string[] a = { "That", "is", "a", "cat" };
        string[] b = { "That", "bat", "flew","over","the", "cat" };
        doc.Add(a);
        doc.Add(b);

       // Console.WriteLine(doc);


        Dictionary<string, float[]> ret = CT.Termfreq(doc);

        foreach (KeyValuePair<string, float[]> kvp in ret)
        {
            Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);

        }


        Console.ReadLine();

    }

I got stuck on the dictionary part. What is the most effective way to implement this?

It sounds like you could use something like:

var dictionary = doc
    .SelectMany(array => array)
    .Distinct()
    .ToDictionary(word => word,
                  word => doc.Select(array => array.Count(x => x == word))
                             .ToArray());

In other words, first find the distinct set of words, then for each word, create a mapping.

To create a mapping, look at each array in the original document, and find the count of the occurrences of the word in that array. (So each array maps to an int .) Use LINQ to perform that mapping over the whole document, with ToArray creating an int[] for a particular word... and that's the value for that word's dictionary entry.

Note that this creates a Dictionary<string, int[]> rather than a Dictionary<string, float[]> - it seems more sensible to me, but you could always cast the result of Count to float if you really wanted to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM