I am looking for a way to get words out of a sentence. I am pretty far with the following expression:
\b([a-zA-Z]+?)\b
but there are some occurrences that it counts a word when I want it not to. Eg a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \\.?
did not do the trick, and variations on this have not yielded anything fruitful either.
Hope someone can help!
A single dot means any character. You must escape it as
\.?
Maybe you want an expression like this:
\w+\.?
or
\p{L}+\.?
You need to add \\.?
(and not .?
) because the period has special meaning in regexes.
to avoid a match on your example "test.." you ask for you not only need to put the \\.?
for checking first character after the word to be a dot but also look one character further to check the second character after the word.
I did end up with something like this \\w{2,}\\.?[^.]
You should also consider that a sentence not always ends with a .
but also !
or ?
and alike.
I usually use rubulator.com to quick test a regexp
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.