简体   繁体   中英

How do I write a regular expression that only matches if match three required capture groups

I'd like to match strings that are comprised of:

  1. First Iniitial
  2. Middle Name
  3. Last Name + optional suffix (Jr. Sr. III, etc.)

and not match string that are comprised of a First Name + Last Name and suffix.

I have the following sample data:

H. Graham Motion
T. James Kelly
J. Palacios Moli
A. Chadwick Box
H. Graham Motion III
T. James Kelly, Jr.
H. Graham Motion II
V. Barboza Jr.

I would like to match all of the strings except the last.

Here is what I have for a regular expression:

^(\w\.)(\s\w+\s[\sI\,\sJSr.]{0,5})*(\w+[\sI\,\sJSr.]{0,5})$

but it not working. You can see the regular expression here at regex101.

I've tweaked your expression a bit and come up with ^(\\w\\.)\\s(\\w+)\\s(\\w+(?:,?\\s(?:I{0,5}|Jr\\.|Sr\\.))?)$ . For the sake of sanity and clarity, I moved the \\s out of the capture groups, since I assume you don't define a middle name as a string of word characters with a leading and trailing space. I think I kept the spirit of your definition of a last name + suffix.

(Very verbose) Explanation:

^                             start
(                             1st group (1st initial)
  \w\.                        one word char followed by a period
)
\s                            one whitespace char
(                             2nd group (middle name)
  \w+                         1 or more word chars
)
\s                            one whitespace char
(                             3rd group (last name + optional suffix)
  \w+                         1 or more word chars
  (?:                         non-capturing group (optional suffix)
    ,?                        0 or 1 commas
    \s                        one whitespace char
    (?:I{1,5}|Jr\.|Sr\.)      one of: 1-5 I chars, "Jr." or "Sr."
  )?                          match suffix group 0 or 1 times
)
$                             end

You'll notice I made the change from I{0,5} to I{1,5} because 0 characters doesn't seem like much of a suffix to me. However I don't see a lot of people with the suffix IIII or IIIII so you may want to change it to I{0,3}|IV|V . You may also want to change the optional comma after the last name to require it before Jr./Sr. and disallow it before a Roman numeral.

Also, remember that \\w also matches underscores and digits! And that \\s matches most whitespace characters, and not just a regular space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM