简体   繁体   中英

C# Regex: only letters followed by an optional

I am looking for a way to get words out of a sentence. I am pretty far with the following expression:

\b([a-zA-Z]+?)\b

but there are some occurrences that it counts a word when I want it not to. Eg a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \\.? did not do the trick, and variations on this have not yielded anything fruitful either.

Hope someone can help!

A single dot means any character. You must escape it as

\.?

Maybe you want an expression like this:

\w+\.?

or

\p{L}+\.?

You need to add \\.? (and not .? ) because the period has special meaning in regexes.

to avoid a match on your example "test.." you ask for you not only need to put the \\.? for checking first character after the word to be a dot but also look one character further to check the second character after the word.

I did end up with something like this \\w{2,}\\.?[^.]

You should also consider that a sentence not always ends with a . but also ! or ? and alike.

I usually use rubulator.com to quick test a regexp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM