简体   繁体   English

用REGEX计算文本中的单词

[英]Counting a Word in a Text with REGEX

Hello :) I have to find all words in a given text with the following restrictions: 您好:)我必须在给定文本中查找具有以下限制的所有单词:

Matching should be case-insensitive. 匹配应该不区分大小写。 Not all matching substrings are words and should be counted. 并非所有匹配的子字符串都是单词,应该计数。 A word is a sequence of letters separated by punctuation or start/end of text. 单词是由标点符号或文本开头/结尾分隔的字母序列。 The output should be a single integer number. 输出应为单个整数。

I have already solved it with StringComparison and a for-loop. 我已经用StringComparison和for循环解决了它。

The code below is my attempt to do it with REGEX(C#). 下面的代码是我尝试使用REGEX(C#)进行的操作。 It only gives me the count of the pattern word, but it is not aware for the restrictions. 它只给我提供模式字的数量,但不知道限制。

Could you give me some tips on how to improve my REGEX pattern? 您能给我一些如何改善我的REGEX模式的提示吗?

string patternWord = Console.ReadLine();
string[] inputSentence = Console.ReadLine().Split();
int count = 0;
string pattern = @"(?:\b\w+\ \s|\S)*" + patternWord + @"(?:\b\w+\b\ \s|\S)?";
Regex rx = new Regex(pattern, RegexOptions.IgnoreCase);
for (int i = 0; i < inputSentence.Length; i++)
{
    var mc = rx.Matches(inputSentence[i]);
    foreach (Match m in mc)
    {
        count++;
    }
}
Console.WriteLine("{0}", count);

EDIT: 编辑:

Example: 例:

Input word - hi 输入字词-嗨

Input sentence - Hidden networks say “Hi” only to Hitachi devices. 输入语句-隐藏网络仅对Hitachi设备说“ Hi” Hi , said Matuhi. ,Matuhi说。 HI !

I only need the bold ones. 我只需要大胆的。

EDIT 2: I have edited the restrictions as well. 编辑2:我也编辑了限制。

How about a simple word-break regex? 一个简单的断字正则表达式怎么样?

\bhi\b

在此处输入图片说明

In C# this would be implemented like this: 在C#中,可以这样实现:

private static int WordCount(string word, string text)
{
    var regex = new Regex(string.Format(@"\b{0}\b", word), 
                      RegexOptions.IgnoreCase);
    return regex.Matches(text).Count;
}

Sorry for not answering your exact questions, but why use regex? 很抱歉没有回答您的确切问题,但是为什么要使用正则表达式? LINQ and a few utility methods from the Char class should suffice for this: LINQ和Char类中的一些实用程序方法就足够了:

using System.Linq;

public class Test
{
    static void Main(string[] args)
    {
        string patternWord = Console.ReadLine();
        string inputSentence = Console.ReadLine();
        var words = GetWords(inputSentence);
        var count = words.Count(word => string.Equals(patternWord, word, StringComparison.InvariantCultureIgnoreCase));
        Console.WriteLine(count);
        Console.ReadLine();
    }

    private static IEnumerable<string> GetWords(string sentence)
    {
        while (!string.IsNullOrEmpty(sentence))
        {
            var word = new string(sentence.TakeWhile(Char.IsLetterOrDigit).ToArray());
            yield return word;
            sentence = new string(sentence.Skip(word.Length).SkipWhile(c => !Char.IsLetterOrDigit(c)).ToArray());
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM