简体   繁体   中英

Hard time figuring out the correct regex to match uppercase words

I have a simple requirement. We use the hibernate validation engine to figure out if a constraint is true or false.

True should be a text if all the words starts with an uppercase character. There are some difficulties:

Words could also start like this

8-Test
or even
8Test
or even
(Test)
or even
-Test
or anything comparable
Also usually they are comma separated (or a different separator):
\nTest, Test, Test \n

Here are some samples: Expected to match all (true):

\n- Hydroxyisohexyl 3-Cyclohexene Carboxaldehyde, Benzyl \n- Test, Test, Test \n- CI 15510, Methylchloroisothiazolinone, Disodium EDTA \n- N/A \n- NA  
Expected to not match all (false):
\n- hydroxyisohexyl 3-Cyclohexene Carboxaldehyde, Benzyl \n- Test, test, Test \n- CI 15510, Methylchloroisothiazolinone, Disodium eDTA \n- na \n- n/a  
My tries were going into this directions:
\nfinal String oldregex = "([\\\\W]*\\\\b[AZ\\\\d]\\\\w+\\\\b[\\\\W]*)+"; final String regex = "([AZ][\\\\d\\\\w]+( [AZ][-\\\\d\\\\w]+)*, )*[AZ][-\\\\d\\\\w]+( [AZ][-\\\\d\\\\w]+)*\\\\.";' 
actually with " " option I ran into an infinitive calculation for some texts Use this to test regex: http://gskinner.com/RegExr/ (without double backslashes of course) ”选项时,我遇到了一些文本的不定式计算。使用它来测试正则表达式: http : //gskinner.com/RegExr/ (当然不带双反斜杠)

Thanks for helping!!!

Regex

See it in action :

^(?:[^A-Za-z]*[A-Z][^\s,]*)*[^A-Za-z]*$

Explanation

^                # start of the string
(?:              # this group matches a "word", don't capture the group
  [^A-Za-z]*     # skip any non-alphabet characters at start of the word
  [A-Z]          # force an uppercase letter as a first letter
  [^\s,]*        # match anything but word separators (\s and ,) after 1th letter
)*               # the whole line consists of such "words"
[^A-Za-z]*       # skip any non-alphabet characters at the end of the string
$                # end of the string

Note: You can modify the regex if your word separator characters different then whitespace and comma. (For example, change [^\\s,] to [^,:-] or whatever you use)

Tested this

^([^A-Za-z]*[A-Z][A-Za-z]*)+?$

It works on your test case

EDIT:

^([^A-Za-z]*?[A-Z][A-Za-z]*?)+.?

for performance issues

这就是我想要的: 大写单词和字符java匹配

"^((^|[^A-Za-z]+)[AZ][A-Za-z]*)*[^A-Za-z]*$"

Something like this seems right:

\b[^a-zA-Z,\s]*?[A-Z][^,\s]*?(\b|,)

The \\b s match the word boundaries. The [^a-zA-Z]*? allows prefixes that aren't letters. Then we have our single uppercase letter with [AZ] , followed by anything that's not an uppercase letter with [^AZ,\\s]*? .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM