简体   繁体   English

计算C#文本文件中的特定单词

[英]count a specifc word in a text file in C#

If i got a text file 如果我有一个文本文件

"dont run if you cant hide, or you will be broken in two strings, your a evil man" “如果你不能躲藏就不要逃跑,否则你会被分成两串,你这个邪恶的人”

and i want to count how many times the word you is in the text file, and put that value in to a int variable. 我想计算一下您在文本文件中的单词次数,然后将该值放入int变量中。

how do i go about doing somthing like that? 我该如何做这样的事情?

To say it with a Regex... 用正则表达式说...

Console.WriteLine((new Regex(@"(?i)you")).Matches("dont run if you cant hide, or you will be broken in two strings, your a evil man").Count)

or if you need the word you as stand-alone 或者如果您需要单独使用这个词

Console.WriteLine((new Regex(@"(?i)\byou\b")).Matches("dont run if you cant hide, or you will be broken in two strings, your a evil man").Count)

Edit: Replaced \\s+you\\s+ with (?i)\\byou\\b for the sake of correctness 编辑:为了正确起见,用(?i)\\ byou \\ b替换了\\ s + you \\ s +

string s = "dont run if you cant hide, or you will be broken in two strings, your a evil man";
var wordCounts = from w in s.Split(' ')
                 group w by w into g
                 select new { Word = g.Key, Count = g.Count() };

int youCount = wordCounts.Single(w => w.Word == "you").Count;
Console.WriteLine(youCount);

Ideally punctuation should be ignored. 理想情况下,应忽略标点符号。 I'll let you handle a messy detail like that. 我将让您处理这样一个凌乱的细节。

Assuming there are regular line breaks then if the file is huge this would be less memory intensive than some other approaches here. 假设存在规则的换行符,则如果文件很大,则与此处的某些其他方法相比,这将减少内存消耗。 Uses Jason's counting method: 使用杰森的计数方法:

        var total = 0;
        using(StreamReader sr=new StreamReader("log.log"))
        {

            while (!sr.EndOfStream)
            {
                var counts = sr
                    .ReadLine()
                    .Split(' ')
                    .GroupBy(s => s)
                    .Select(g => new{Word = g.Key,Count = g.Count()});
                var wc = counts.SingleOrDefault(c => c.Word == "you");
                total += (wc == null) ? 0 : wc.Count;
            }
        }

Or, combining the Scoregraphic's answer here with a IEnumerable method: 或者,将评分统计的答案与IEnumerable方法结合使用:

 
 
 
 
  
  
   static IEnumerable<string> Lines(string filename) { using (var sr = new StreamReader(filename)) { while (!sr.EndOfStream) { yield return sr.ReadLine(); } } }
 
 
  

You could get a nifty one-liner 你会得到一个漂亮的单线

 
 
 
 
  
  
   Lines("log.log") .Select(line => Regex.Matches(line, @"(?i)\\byou\\b").Count) .Sum();
 
 
  

[Edited because System.IO.File now supports enumerating the lines of a file, removing need for hand rolled method of doing the same thing described above] [已编辑,因为System.IO.File现在支持枚举文件的行,从而消除了执行上述相同操作的手动方法的需要]

Or using framework method File.ReadLines() you could reduce this to: 或者使用框架方法File.ReadLines()可以将其减少为:

 File.ReadLines("log.log") .Select(line => Regex.Matches(line, @"(?i)\\byou\\b").Count) .Sum(); 

Reading from a file: 从文件读取:

int count;

using (StreamReader reader = File.OpenText("fileName")
{
   string contents = reader.ReadToEnd();
   MatchCollection matches = Regex.Matches(contents, "\byou\b");
   count = matches.Count;
}

Note that if you use " \\byou\\b " will match just the word "you" by itself. 请注意,如果您使用“ \\byou\\b ”,则仅会匹配单词“ you”。 If you want to match "you" inside of other words (for example, the "you" in "your"), use "you" as the pattern instead of "\\byou\\b". 如果要在其他单词中匹配“ you”(例如,“ your”中的“ you”),请使用“ you”作为模式,而不是“ \\ byou \\ b”。

try regular expressions: 尝试正则表达式:

Regex r = new Regex("test");
MatchCollection matches = r.Matches("this is a test of using regular expressions to count how many times test is said in a string");
int iCount = matches.Count;

The following method will do the job. 以下方法将完成此工作。

public Int32 GetWordCountInFile(String fileName, String word, Boolean ignoreCase)
{
    return File
        .ReadAllText(fileName)
        .Split(new [] { ' ', '.', ',' })
        .Count(w => String.Compare(w, word, ignoreCase));
}

Maybe you will have to add a few other possible separators to the String.Split() call. 也许您必须在String.Split()调用中添加一些其他可能的分隔符。

Try counting the occurances using indexOf and then moving to the next entry. 尝试使用indexOf计算发生次数,然后移至下一个条目。 Eg 例如

using System;

namespace CountOcc
{
 class Program
 {
  public static void Main(string[] args)
  {
   int         StartPos; // Current pos in file.

   System.IO.StreamReader sr = new System.IO.StreamReader( "c:\\file.txt" );
   String Str = sr.ReadToEnd();

   int Count = 0;
   StartPos = 0;
   do
   {
    StartPos = Str.IndexOf( "Services", StartPos );
    if ( StartPos >= 0 )
    {
     StartPos++;
     Count++;
    }
   } while ( StartPos >= 0 );

   Console.Write("File contained " + Count + " occurances");
   Console.ReadKey(true);
  }
 }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM