简体   繁体   English

C#正则表达式:仅字母,后跟可选

[英]C# Regex: only letters followed by an optional

I am looking for a way to get words out of a sentence. 我正在寻找一种使句子中的单词变得更简单的方法。 I am pretty far with the following expression: 我对以下表达式表示满意:

\b([a-zA-Z]+?)\b

but there are some occurrences that it counts a word when I want it not to. 但是有些情况下,当我不想这么做时,它会算一个单词。 Eg a word followed by more than one period like "text..". 例如,一个单词后接一个以上的句点,例如“ text ..”。 So, in my regex I want to have the period to be at the end of a word zero or one time. 因此,在我的正则表达式中,我希望句点在一个单词的结尾处为零或一次。 Inserting \\.? 插入\\.? did not do the trick, and variations on this have not yielded anything fruitful either. 并没有解决这个问题,在此方面也没有产生任何成果。

Hope someone can help! 希望有人能帮忙!

A single dot means any character. 单个点表示任何字符。 You must escape it as 您必须将其转义为

\.?

Maybe you want an expression like this: 也许您想要一个这样的表达式:

\w+\.?

or 要么

\p{L}+\.?

You need to add \\.? 您需要添加\\.? (and not .? ) because the period has special meaning in regexes. (而不是.? ),因为句点在正则表达式中具有特殊含义。

to avoid a match on your example "test.." you ask for you not only need to put the \\.? 为了避免与示例“ test ..”相匹配,您不仅要求将\\.? for checking first character after the word to be a dot but also look one character further to check the second character after the word. 用于检查单词后的第一个字符是否为点,还可以进一步查看一个字符以检查单词后的第二个字符。

I did end up with something like this \\w{2,}\\.?[^.] 我确实得到了这样的\\w{2,}\\.?[^.]

You should also consider that a sentence not always ends with a . 您还应该考虑一个句子并不总是以.结尾. but also ! 而且! or ? 还是? and alike. 都一样

I usually use rubulator.com to quick test a regexp 我通常使用rubulator.com快速测试正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM