How to create non-greedy regular expression from right?

Question

I have a file named 'ab9c_xy8z_12a3.pdf' . I want to capture part after the last underscore and before '.pdf'. Writing regular expression like :

    s = 'ab9c_xy8z_12a3.pdf'
    m = re.search(r'_.*?\.pdf',s)
    m.group(0)

returns: '_xy8z_12a3.pdf'

In this example, I would like to capture only '12a3' part. Thank you for your help.

Answer 1

The _.*?\\.pdf regex matches the first underscore with _ , then matches any 0+ chars other than a newline, as few as possible, but up to the leftmost occurrence of .pdf , which turns out to be at the end of the string. So, . matched all underscores on its way to .pdf , just because of the way a regex engine parses the string (from left to right) and due to . pattern.

You may fix the pattern by using a negated character class [^_] instead of . that will "subtract" underscores from . pattern.

([^_]+)\.pdf

and grab Group 1 value. See the regex demo .

Python demo :

import re
rx = r"([^_]+)\.pdf"
s = "ab9c_xy8z_12a3.pdf"
m = re.search(rx, s)
if m:
    print(m.group(1)) # => 12a3

Answer 2

Use re.split instead:

>>> re.split('[_.]', 'ab9c_xy8z_12a3.pdf')[-2]
'12a3'

How to create non-greedy regular expression from right?

Question

2 answers

solution1
2 ACCPTED 2018-04-02 20:07:51

solution2
1 2018-04-02 18:59:03

How to create non-greedy regular expression from right?

Question

2 answers

solution1 2 ACCPTED 2018-04-02 20:07:51

solution2 1 2018-04-02 18:59:03

solution1
2 ACCPTED 2018-04-02 20:07:51

solution2
1 2018-04-02 18:59:03