Good evening. I have a string like "leicht bewölkt leichter Regen Regen". I need a regex pattern that matches "leicht bewölkt" (two adjectives), "leichter Regen" (adjective and noun) and "Regen" (noun). I have found out, how I can match against an adjective "\\b[az][az]*\\b", but how can I do that with two adjectives or one adjective and a noun...? I'm a bit lost. Thanks in advance.
\\b[az][az]*\\b
A regex matching a single full word starting with an uppercase letter is easy to derive from your current regex, just replace the first character class by its uppercase equivalent :
\b[A-Z][a-z]*\b
Now we only need to combine the two to match the following patterns :
We can represent consecutive words by joining them with a single space character.
A basic solution will be an alternation of the three patterns listed above :
\b[a-z][a-z]*\b \b[a-z][a-z]*\b|\b[a-z][a-z]*\b \b[A-Z][a-z]*\b|\b[A-Z][a-z]*\b
^________two adjectives_______^ ^____one adjective one noun___^ ^__one noun__^
It can be improved in multiple ways :
\\b[az]+\\b
( +
is "one or more", which is the same as one and then "0 or more" *
)[az]
and a space, therefore the \\b
after a word and before a space and those after a space and before a word can be removed, as they always will be matched if the word and the space are. In conclusion, I would use the following :
\b[a-z]+ [a-z]+\b|\b[a-z]+ [A-Z][a-z]*\b|\b[A-Z][a-z]*\b
Testing it on regex101 shows you will have problems with non-ascii characters ( ö
isn't matched by [az]
and isn't considered a word character , unless the UNICODE flag is set ).
To handle the unicode problem you can use the \\p{Ll}
"lowercase letters of any language" and \\p{Lu}
"uppercase letters of any language" meta-characters in conjunction with the UNICODE flag / UNICODE_CHARACTER_CLASS for java (needed for \\b
to work correctly) instead of your current character classes :
\b\p{Ll}+ \p{Ll}+\b|\b\p{Ll}+ \p{Lu}\p{L}*\b|\b\p{Lu}\p{Ll}*\b
( regex101 , java code on ideone )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.