简体   繁体   English

在字符串中搜索特定的单词。 C#

[英]Searching String for specific Word. C#

I would like to search a string for a specific words that a user would type in and then output the percentage that word is displayed within the text. 我想在字符串中搜索用户输入的特定单词,然后输出该单词在文本中显示的百分比。 Just wondering what the best method for this would be and if you could help me out please. 只是想知道最好的方法是什么,请您帮我一下。

I suggest using String.Equals overload with StringComparison specified for better performance. 我建议使用String.Equals重载并指定StringComparison以获得更好的性能。

var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' };
var words = sentence.Split (separators);
var matches = words.Count (w =>
    w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase));
var percentage = matches / (float) words.Count;

Note that percentage will be float , eg 0.5 for 50%. 请注意, percentage将是float ,例如0.5表示50%。
You can format it for display using ToString overload: 您可以使用ToString重载将其格式化以显示:

var formatted = percentage.ToString ("P0"); // 0.1234 => 12 %

You can also change format specifier to show decimal places: 您还可以更改格式说明符以显示小数位:

var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 %

Please keep in mind that this method is ineffective for large strings because it creates a string instance for each of the words found. 请记住,此方法对大型字符串无效,因为它会为找到的每个单词创建一个字符串实例。 You might want to take StringReader and read word by word manually. 您可能需要使用StringReader并逐字手动读取。

The easiest way is to use LINQ: 最简单的方法是使用LINQ:

char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'};
var count =
    (from word In sentence.Split(separators)      // get all the words
    where word.ToLower() = searchedWord.ToLower() // find the words that match
    select word).Count();                         // count them

This only counts the number of times the word appears in the text. 这仅计算单词在文本中出现的次数。 You could also count how many words there are in the text: 您还可以计算文本中有多少个单词:

var totalWords = sentence.Split(separators).Count());

and then just get the percentage as: 然后将百分比取为:

var result = count / totalWords * 100;

My suggestion is a complete class. 我的建议是完成一堂课。

class WordCount {
    const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'";

    public static string normalize(string str)
    {
        var toret = new StringBuilder();

        for(int i = 0; i < str.Length; ++i) {
            if ( Symbols.IndexOf( str[ i ] ) > -1 ) {
                toret.Append( ' ' );
            } else {
                toret.Append( char.ToLower( str[ i ] ) );
            }
        }

        return toret.ToString();
    }

    private string word;
    public string Word {
        get { return this.word; }
        set { this.word = value; }
    }

    private string str;
    public string Str {
        get { return this.str; }
    }

    private string[] words = null;
    public string[] Words {
       if ( this.words == null ) {
           this.words = this.Str.split( ' ' );
       }

       return this.words;
    }

    public WordCount(string str, string w)
    {
         this.str = ' ' + normalize( str ) + ' ';
         this.word = w;
    }

    public int Times()
    {
        return this.Times( this.Word );
    }

    public int Times(string word)
    {
        int times = 0;

        word = ' ' + word + ' ';

        int wordLength = word.Length;
        int pos = this.Str.IndexOf( word );

        while( pos > -1 ) {
            ++times;

            pos = this.Str.IndexOf( pos + wordLength, word );
        }

        return times;
    }

    public double Percentage()
    {
        return this.Percentage( this.Word );
    }

    public double Percentage(string word)
    {
        return ( this.Times( word ) / this.Words.Length );
    }
}

Advantages: string splitting is cached, so there is no danger of applying it more than one time. 优点:缓存了字符串拆分,因此不存在多次应用拆分的危险。 It is packaged in one class, so it can be easily resuable. 它被打包在一个类中,因此可以轻松恢复使用。 No necessity of Linq. 不需要Linq。 Hope this helps. 希望这可以帮助。

// The words you want to search for
var words = new string[] { "this", "is" };

// Build a regular expresion query
var wordRegexQuery = new System.Text.StringBuilder();
wordRegexQuery.Append("\\b(");
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++)
{
  wordRegexQuery.Append(words[wordIndex]);
  if (wordIndex < words.Length - 1)
  {
    wordRegexQuery.Append('|');
  }
}
wordRegexQuery.Append(")\\b");

// Find matches and return them as a string[]
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase);
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa.";
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray();

// Display results
foreach (var word in words)
{
    var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase));
    Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f / matches.Length);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM