简体   繁体   English

计算文件c#中每个唯一单词的出现

[英]count the occurrence of each unique word in the file c#

I am writing a simple console application to would allow me to count the occurrence of each unique word. 我正在编写一个简单的控制台应用程序,使我可以计算每个唯一单词的出现次数。 for example the console will allow the user to type a sentence, once press enter the system should count the number of time each words occurs. 例如,控制台将允许用户键入一个句子,一旦按Enter键,系统应计算每个单词出现的时间。 so far I can only count characters. 到目前为止,我只能数字符。 any help would be appreciated. 任何帮助,将不胜感激。

class Program
{
    static void Main(string[] args)
    {

        Console.WriteLine("Please enter string");
        string input = Convert.ToString(Console.ReadLine());
        Dictionary<string, int> objdict = new Dictionary<string, int>();
        foreach (var j in input)
        {
            if (objdict.ContainsKey(j.ToString()))
            {
                objdict[j.ToString()] = objdict[j.ToString()] + 1;
            }
            else
            {
                objdict.Add(j.ToString(), 1);
            }
        }
        foreach (var temp in objdict)
        {
            Console.WriteLine("{0}:{1}", temp.Key, temp.Value);
        }
        Console.ReadLine();
    }
}

Try this method: 试试这个方法:

private void countWordsInALIne(string line, Dictionary<string, int> words)
{
    var wordPattern = new Regex(@"\w+");

    foreach (Match match in wordPattern.Matches(line))
    {
        int currentCount=0;
        words.TryGetValue(match.Value, out currentCount);

        currentCount++;
        words[match.Value] = currentCount;
    }
}

Call the above method like this: 像这样调用上面的方法:

var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

countWordsInALine(line, words);

In the words dictionary you will find the words (key) along with its occurance frequency (value). 在单词词典中,您会找到单词(键)及其出现频率(值)。

Just call Split method passing single space (assuming word is seperated by single space) and it would give collection of each word then iterate over each element of collection with the same logic you were having. 只需调用Split方法传递单个空格即可(假设单词被单个空格分隔),它将给出每个单词的集合,然后使用与您相同的逻辑遍历集合的每个元素。

class Program
{
    static void Main(string[] args)
    {

        Console.WriteLine("Please enter string");
        string input = Convert.ToString(Console.ReadLine());
        Dictionary<string, int> objdict = new Dictionary<string, int>();
        foreach (var j in input.Split(" "))
        {
            if (objdict.ContainsKey(j))
            {
                objdict[j] = objdict[j] + 1;
            }
            else
            {
                objdict.Add(j, 1);
            }
        }
        foreach (var temp in objdict)
        {
            Console.WriteLine("{0}:{1}", temp.Key, temp.Value);
        }
        Console.ReadLine();
    }
}

You need to split the string on spaces (or any other characters which you consider to delimit words). 您需要在空格(或您认为用来分隔单词的任何其他字符)上分割字符串。 Try changing the loop to this: 尝试将循环更改为此:

foreach (string Word in input.Split(' ')) {

}

Try this... 尝试这个...

var theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => txt)
    .Where(grouping => grouping.Count() > 1)
    .ToList()
    .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
         groupItem.Key,
         groupItem.Count(),
         string.Join(" ", groupItem.ToArray())));
        Console.ReadKey();

http://omegacoder.com/?p=792 http://omegacoder.com/?p=792

Might I suggest a ternary-tree to make things efficient? 我可以建议使用三叉树来提高效率吗?

Here's a link to a C# implementation: http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/ 这是C#实现的链接: http : //igoro.com/archive/efficiency-auto-complete-with-a-ternary-search-tree/

After first inserting into the tree, you could simply call "Contains" with one of the implementations above to make things quick 第一次插入树后,您可以使用上述实现之一简单地调用“ Contains”,以使操作更快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM