简体   繁体   English

使用RegEx查找没有数字的所有单词

[英]Find all words without figures using RegEx

I found this code to get all words of a string, 我发现这段代码可以获取字符串的所有单词,

static string[] GetWords(string input)
{
    MatchCollection matches = Regex.Matches(input, @"\b[\w']*\b");

    var words = from m in matches.Cast<Match>()
                where !string.IsNullOrEmpty(m.Value)
                select TrimSuffix(m.Value);

    return words.ToArray();
}

static string TrimSuffix(string word)
{
    int apostrapheLocation = word.IndexOf('\'');
    if (apostrapheLocation != -1)
    {
        word = word.Substring(0, apostrapheLocation);
    }

    return word;
}
  1. Please describe about the code. 请描述一下代码。
  2. How can I get words without figures? 如何获得没有数字的单词?

2 How can I get words without figures? 2如何获得没有数字的单词?

You'll have to replace \\w with [A-Za-z] 你必须用[A-Za-z]替换\\w

So that your RegEx becomes @"\\b[A-Za-z']*\\b" 这样你的RegEx变成了@"\\b[A-Za-z']*\\b"

And then you'll have to think about TrimSuffix(). 然后你将不得不考虑TrimSuffix()。 The regEx allows apostrophes but TrimSuffix() will extract only the left part. regEx允许使用撇号,但TrimSuffix()仅提取左侧部分。 So "it's" will become "it". 所以“它的”将成为“它”。

In

MatchCollection matches = Regex.Matches(input, @"\b[\w']*\b");

the code is using a regex that will look for any word; 代码正在使用正在查找任何单词的正则表达式; \\b means border of word and \\w is the alpha numerical POSIX class to get everything as letters(with or without graphical accents), numbers and sometimes underscore and the ' is just included in the list along with the alphaNum. \\ b表示单词的边框,\\ w是字母数字POSIX类,用于将所有内容都作为字母(带或不带图形重音符号),数字,有时还有下划线,'只是包含在列表中以及alphaNum。 So basically that is searching for the begining and the end of the word and selecting it. 所以基本上就是搜索单词的开头和结尾并选择它。

then 然后

var words = from m in matches.Cast<Match>()
                    where !string.IsNullOrEmpty(m.Value)
                    select TrimSuffix(m.Value);

is a LINQ syntax, where you can do SQL-Like queries inside your code. 是一种LINQ语法,您可以在代码中执行类似SQL的查询。 That code is getting every match from the regex and checking to see if the value is not empty and to get it without spaces. 该代码从正则表达式中获取每个匹配项并检查该值是否为空并且不使用空格。 Its also where you can add your figure validation. 它也是您可以添加图形验证的地方。

and This: 还有这个:

static string TrimSuffix(string word)
    {
        int apostrapheLocation = word.IndexOf('\'');
        if (apostrapheLocation != -1)
        {
            word = word.Substring(0, apostrapheLocation);
        }

        return word;
    }

is removing the ' of the words who have it and getting just the part that is before it 正在删除那些拥有它并获得它之前的部分的单词

ie for don't word it will get only the don 也就是说不要说它只会得到

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM