[英]How to calculate freqency of all the words in a text document?
class CounterDict<TKey>
{
public Dictionary<TKey, int> _dict = new Dictionary<TKey, int>();
public void Add(TKey key)
{
if(_dict.ContainsKey(key))
_dict[key]++;
else
{
_dict.Add(key, 1);
}
}
}
class Program
{
static void Main(string[] args)
{
string line = "The woods decay the woods decay and fall.";
CounterDict<string> freq = new CounterDict<string>();
foreach (string item in line.Split())
{
freq.Add(item.Trim().ToLower());
}
foreach (string key in freq._dict.Keys)
{
Console.WriteLine("{0}:{1}",key,freq._dict[key]);
}
}
}
I want to calculate number of occurences of all the words in a string. 我想计算字符串中所有单词的出现次数。
I think above code will be slow at this task because of (look into the Add function) : 我认为上面的代码在此任务上会很慢,因为(查看Add函数):
if(_dict.ContainsKey(key))
_dict[key]++;
else
{
_dict.Add(key, 1);
}
Also, is keeping _dict__
public
good practice? 此外,保持
_dict__
public
良好做法吗? (I don't think it is.) (我认为不是。)
How should I modify this or change it totally to do the job? 我应该如何修改或完全更改它以完成工作?
How about this: 这个怎么样:
Dictionary<string, int> words = new Dictionary<string, int>();
string input = "The woods decay the woods decay and fall.";
foreach (Match word in Regex.Matches(input, @"\w+", RegexOptions.ECMAScript))
{
if (!words.ContainsKey(word.Value))
{
words.Add(word.Value, 1);
}
else
{
words[word.Value]++;
}
}
Principal point was replacing .Split
by a regular expression, so you don't need to keep a big string array in memory and you can work with one item at time. 主要要点是用正则表达式替换
.Split
,因此您不需要在内存中保留大字符串数组,并且可以一次处理一个项目。
From the msdn documentation: 从msdn文档中:
// When a program often has to try keys that turn out not to
// be in the dictionary, TryGetValue can be a more efficient
// way to retrieve values.
string value = "";
if (openWith.TryGetValue("tif", out value))
{
Console.WriteLine("For key = \"tif\", value = {0}.", value);
}
else
{
Console.WriteLine("Key = \"tif\" is not found.");
}
Haven't tested for it myself, but it might improve your efficiency. 我自己还没有进行测试,但这可能会提高您的效率。
这是一些计算字符串出现次数的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.