I'd like to match strings that are comprised of:
and not match string that are comprised of a First Name + Last Name and suffix.
I have the following sample data:
H. Graham Motion
T. James Kelly
J. Palacios Moli
A. Chadwick Box
H. Graham Motion III
T. James Kelly, Jr.
H. Graham Motion II
V. Barboza Jr.
I would like to match all of the strings except the last.
Here is what I have for a regular expression:
^(\w\.)(\s\w+\s[\sI\,\sJSr.]{0,5})*(\w+[\sI\,\sJSr.]{0,5})$
but it not working. You can see the regular expression here at regex101.
I've tweaked your expression a bit and come up with ^(\\w\\.)\\s(\\w+)\\s(\\w+(?:,?\\s(?:I{0,5}|Jr\\.|Sr\\.))?)$
. For the sake of sanity and clarity, I moved the \\s
out of the capture groups, since I assume you don't define a middle name as a string of word characters with a leading and trailing space. I think I kept the spirit of your definition of a last name + suffix.
^ start
( 1st group (1st initial)
\w\. one word char followed by a period
)
\s one whitespace char
( 2nd group (middle name)
\w+ 1 or more word chars
)
\s one whitespace char
( 3rd group (last name + optional suffix)
\w+ 1 or more word chars
(?: non-capturing group (optional suffix)
,? 0 or 1 commas
\s one whitespace char
(?:I{1,5}|Jr\.|Sr\.) one of: 1-5 I chars, "Jr." or "Sr."
)? match suffix group 0 or 1 times
)
$ end
You'll notice I made the change from I{0,5}
to I{1,5}
because 0 characters doesn't seem like much of a suffix to me. However I don't see a lot of people with the suffix IIII
or IIIII
so you may want to change it to I{0,3}|IV|V
. You may also want to change the optional comma after the last name to require it before Jr./Sr. and disallow it before a Roman numeral.
Also, remember that \\w
also matches underscores and digits! And that \\s
matches most whitespace characters, and not just a regular space.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.