简体   繁体   中英

simple regex to find two words

I have a quick question with a regex that is driving me crazy:

    sentence="Dr. Peter is a great man. Dr. med. Lumpert Mercury is a great man."
    for m in re.finditer("(Dr\.|med\.)\s([A-Z][a-z]+)", sentence):
          print '%02d-%02d: %s' % (m.start(), m.end(), m.group(2))

This code gives me all words after a string "Dr." or "med." if the word begins with a capital. Now i need the output for two words after the string. Both again only if they begin with a capital. I tried some stuff like:

    for m in re.finditer("(Dr\.|med\.)\s(([A-Z][a-z]+)|([A-Z][a-z]+)\s([A-Z][a-z]+))", sentence):
          print '%02d-%02d: %s' % (m.start(), m.end(), m.group(2, 3))

You see how i got knotted there. How can i reach "Lumpert Mercury" but also "Peter"?

Now i need both, one word or two words after "Dr." and "med.".

I need to get "Peter" and "Lumpert Mercury".

Use a Non-capturing group, making it optional inside the original capturing group.

>>> import re
>>> s = "Dr. Peter is a great man. Dr. med. Lumpert Mercury is a great man."
>>> for m in re.finditer("(?:Dr|med)\.\s*([A-Z][a-z]+(?: [A-Z][a-z]+)?)", s):
...     print '%02d-%02d: %s' % (m.start(), m.end(), m.group(1))

00-09: Peter
30-50: Lumpert Mercury

您需要使用以下regular expression :-

(?:Dr|med)\.\s*([A-Z][a-z]*)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM