简体   繁体   English

从List <string []>到字典中获取唯一字符串的计数

[英]Getting a count of unique strings from a List<string[]> into a dictionary

I want to input a List<string[]> and 我想输入一个List<string[]>

The output is a dictionary where the keys are unique strings used for an index and the values is an array of floats with each position in the array representing the count of the key for a string[] in the List<string[]> 输出是一个字典,其中键是用于一个索引唯一的字符串以及值是浮点数的数组与阵列中表示密钥的计数为每一个位置string[]List<string[]>

So far here is what I attempted 到目前为止,这是我的尝试

static class CT
{
    //Counts all terms in array
    public static Dictionary<string, float[]> Termfreq(List<string[]> text)
    {
        List<string> unique = new List<string>();

        foreach (string[] s in text)
        {
            List<string> groups = s.Distinct().ToList();
            unique.AddRange(groups);
        }

        string[] index = unique.Distinct().ToArray();

        Dictionary<string, float[]> countset = new Dictionary<string, float[]>();


         return countset;
    }

}



 static void Main()
    {
        /* local variable definition */


        List<string[]> doc = new List<string[]>();
        string[] a = { "That", "is", "a", "cat" };
        string[] b = { "That", "bat", "flew","over","the", "cat" };
        doc.Add(a);
        doc.Add(b);

       // Console.WriteLine(doc);


        Dictionary<string, float[]> ret = CT.Termfreq(doc);

        foreach (KeyValuePair<string, float[]> kvp in ret)
        {
            Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);

        }


        Console.ReadLine();

    }

I got stuck on the dictionary part. 我被困在字典部分。 What is the most effective way to implement this? 实现这一目标的最有效方法是什么?

It sounds like you could use something like: 听起来你可以使用类似的东西:

var dictionary = doc
    .SelectMany(array => array)
    .Distinct()
    .ToDictionary(word => word,
                  word => doc.Select(array => array.Count(x => x == word))
                             .ToArray());

In other words, first find the distinct set of words, then for each word, create a mapping. 换句话说,首先找到不同的单词集,然后为每个单词创建一个映射。

To create a mapping, look at each array in the original document, and find the count of the occurrences of the word in that array. 要创建映射,请查看原始文档中的每个数组,并查找该数组中单词出现次数。 (So each array maps to an int .) Use LINQ to perform that mapping over the whole document, with ToArray creating an int[] for a particular word... and that's the value for that word's dictionary entry. (因此每个数组都映射到一个int 。)使用LINQ在整个文档上执行映射, ToArray为特定单词创建一个int[] ,这就是该单词的词典条目的值。

Note that this creates a Dictionary<string, int[]> rather than a Dictionary<string, float[]> - it seems more sensible to me, but you could always cast the result of Count to float if you really wanted to. 请注意,这会创建一个Dictionary<string, int[]>而不是Dictionary<string, float[]> - 这对我来说似乎更明智,但是如果你真的想要,你总是可以将Count的结果转换为float

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM