简体   繁体   English

分割逗号分隔的字符串以计算重复项

[英]Split comma separated string to count duplicates

I have the following data in my database (comma separated strings): 我的数据库中有以下数据(用逗号分隔的字符串):

"word, test, hello" “单词,测试,你好”
"test, lorem, word" “测试,lorem,单词”
"test" “测试”
... ...
etc 等等

How can I transform this data into a Dictionary whereby each string is separated into each distinct word together with the number of times that it occurs, ie 如何将这些数据转换成字典,从而将每个字符串与出现的次数一起分成每个不同的词,即

{"test", 3},  {"word", 2}, {"hello", 1}, {"lorem", 1}

I will have approximately 3000 rows of data in case this makes a difference to any solution offered. 如果这对所提供的任何解决方案有所帮助,我将有大约3000行数据。 Also I am using .NET 3.5 (and would be interested to see any solution using linq) 我也在使用.NET 3.5(希望看到使用linq的任何解决方案)

IEnumerable<string> strings = ...;

Dictionary<string,int> result = strings.SelectMany(s => s.Split(','))
                                       .GroupBy(s => s.Trim())
                                       .ToDictionary(g => g.Key, g => g.Count());

Here is something like a pseudocode(haven't tried to compile it) 这有点像伪代码(没有尝试编译它)

List<string> allRows = getFromDatabase();

var result = new Dictionary<string, int>();
foreach (string row in allRows)
{
   string[] words = row.Split(',');

   foreach (string word in words)
      if (result.ContainsKey(word))
         result[word]++;
      else
         result.Add(word, 1);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM