简体   繁体   English

C#正则表达式获取单词

[英]C# regex pattern to getting words

I am trying to figure out the pattern that will get words from a string. 我试图找出将从字符串中获取单词的模式。 Say for instance my string is: 比方说我的字符串是:

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";

I tried to eliminate anything under 1 letter or number but it doesn't work: 我试图消除1个字母或数字以外的任何东西,但它不起作用:

Regex.Split(s, @"\b\w{1,1}\b");

I also tried this: 我也试过这个:

Regex.Splits(text, @"\W+"); 

But it outputs: 但它输出:

"HI how are ad you today" “你今天的广告怎么样?”

I just want to get all the words so that my final string is: 我只想得到所有的单词,以便我的最终字符串是:

"HI how are you today" “你好,你今天怎样”

To get all words that are at least 2 characters long you can use this pattern: \\b[a-zA-Z]{2,}\\b . 要获得至少2个字符长的所有单词,您可以使用此模式: \\b[a-zA-Z]{2,}\\b

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
var matches = Regex.Matches(text, @"\b[a-zA-Z]{2,}\b");
string result = String.Join(" ", matches.Cast<Match>().Select(m => m.Value));
Console.WriteLine(result);

As others have pointed out in the comments, "A" and "I" are valid words. 正如其他人在评论中指出的那样,“A”和“I”是有效的词。 In case you decide to match those you can use this pattern instead: 如果您决定匹配那些,您可以使用此模式:

var matches = Regex.Matches(text, @"\b(?:[a-z]{2,}|[ai])\b",
                            RegexOptions.IgnoreCase);

In both patterns I've used \\b to match word-boundaries. 在这两种模式中,我都使用了\\b来匹配单词边界。 If you have input such as "1abc2" then "abc" wouldn't be matched. 如果您输入“1abc2”,那么“abc”将不匹配。 If you want it to be matched then remove the \\b metacharacters. 如果要匹配它,请删除\\b元字符。 Doing so from the first pattern is straightforward. 从第一种模式这样做是很简单的。 The second pattern would change to [az]{2,}|[ai] . 第二个模式将更改为[az]{2,}|[ai]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM