简体   繁体   中英

Get sentence after pattern with regex python

In my string (example adopted from this turorial ) I want to get everything until the first following . after the generic (year). pattern:

str = 'purple alice@google.com, (2002).blah monkey. (1991).@abc.com blah dishwasher'

I think I'm almost there with my code but not quite yet:

test = re.findall(r'[\(\d\d\d\d\).-]+([^.]*)', str)

... which returns: ['com, (2002)', 'blah monkey', ' (1991)', '@abc', 'com blah dishwasher']

The desired output is:

['blah monkey', '@abc']

In other words, I want to find everything that is between the year pattern and the next dot.

If you want to get every thing between (year). and the first . you can use this:

\(\d{4}\)\.([^.]*)

See Live Demo .

And explanation here:

"\(\d{4}\)\.([^.]*)"g

\( matches the character ( literally
  \d{4} match a digit [0-9]
    Quantifier: {4} Exactly 4 times
       \) matches the character ) literally
         \. matches the character . literally
1st Capturing group ([^.]*)
    [^.]* match a single character not present in the list below
        Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        . the literal character .
g modifier: global. All matches (don't return on first match)

You are using [...] in the wrong way. Try with \\(\\d{4}\\)\\.([^.]*)\\. :

>>> s = 'purple alice@google.com, (2002).blah monkey. (1991).@abc.com blah dishwasher'
>>> re.findall(r'\(\d{4}\)\.([^.]*)\.', s)
['blah monkey', '@abc']

For the reference, [...] specifies a character class . By using [\\(\\d\\d\\d\\d\\).-] you were saying: one of 0123456789().- .

This should do the trick

print re.findall(r'\(\d{4}\)\.([^\.]+)', str)
$ ['blah monkey', '@abc']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM