In my string (example adopted from this turorial ) I want to get everything until the first following .
after the generic (year).
pattern:
str = 'purple alice@google.com, (2002).blah monkey. (1991).@abc.com blah dishwasher'
I think I'm almost there with my code but not quite yet:
test = re.findall(r'[\(\d\d\d\d\).-]+([^.]*)', str)
... which returns: ['com, (2002)', 'blah monkey', ' (1991)', '@abc', 'com blah dishwasher']
The desired output is:
['blah monkey', '@abc']
In other words, I want to find everything that is between the year pattern and the next dot.
If you want to get every thing between (year).
and the first .
you can use this:
\(\d{4}\)\.([^.]*)
See Live Demo .
And explanation here:
"\(\d{4}\)\.([^.]*)"g
\( matches the character ( literally
\d{4} match a digit [0-9]
Quantifier: {4} Exactly 4 times
\) matches the character ) literally
\. matches the character . literally
1st Capturing group ([^.]*)
[^.]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
. the literal character .
g modifier: global. All matches (don't return on first match)
You are using [...]
in the wrong way. Try with \\(\\d{4}\\)\\.([^.]*)\\.
:
>>> s = 'purple alice@google.com, (2002).blah monkey. (1991).@abc.com blah dishwasher'
>>> re.findall(r'\(\d{4}\)\.([^.]*)\.', s)
['blah monkey', '@abc']
For the reference, [...]
specifies a character class . By using [\\(\\d\\d\\d\\d\\).-]
you were saying: one of 0123456789().-
.
This should do the trick
print re.findall(r'\(\d{4}\)\.([^\.]+)', str)
$ ['blah monkey', '@abc']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.