[英]C# Regex: only letters followed by an optional
I am looking for a way to get words out of a sentence. 我正在寻找一种使句子中的单词变得更简单的方法。 I am pretty far with the following expression:
我对以下表达式表示满意:
\b([a-zA-Z]+?)\b
but there are some occurrences that it counts a word when I want it not to. 但是有些情况下,当我不想这么做时,它会算一个单词。 Eg a word followed by more than one period like "text..".
例如,一个单词后接一个以上的句点,例如“ text ..”。 So, in my regex I want to have the period to be at the end of a word zero or one time.
因此,在我的正则表达式中,我希望句点在一个单词的结尾处为零或一次。 Inserting
\\.?
插入
\\.?
did not do the trick, and variations on this have not yielded anything fruitful either. 并没有解决这个问题,在此方面也没有产生任何成果。
Hope someone can help! 希望有人能帮忙!
A single dot means any character. 单个点表示任何字符。 You must escape it as
您必须将其转义为
\.?
Maybe you want an expression like this: 也许您想要一个这样的表达式:
\w+\.?
or 要么
\p{L}+\.?
You need to add \\.?
您需要添加
\\.?
(and not .?
) because the period has special meaning in regexes. (而不是
.?
),因为句点在正则表达式中具有特殊含义。
to avoid a match on your example "test.." you ask for you not only need to put the \\.?
为了避免与示例“ test ..”相匹配,您不仅要求将
\\.?
for checking first character after the word to be a dot but also look one character further to check the second character after the word. 用于检查单词后的第一个字符是否为点,还可以进一步查看一个字符以检查单词后的第二个字符。
I did end up with something like this \\w{2,}\\.?[^.]
我确实得到了这样的
\\w{2,}\\.?[^.]
You should also consider that a sentence not always ends with a .
您还应该考虑一个句子并不总是以
.
结尾.
but also !
而且
!
or ?
还是
?
and alike. 都一样
I usually use rubulator.com to quick test a regexp 我通常使用rubulator.com快速测试正则表达式
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.