如何从右边创建非贪婪的正则表达式？

Question

I have a file named 'ab9c_xy8z_12a3.pdf' . 我有一个名为'ab9c_xy8z_12a3.pdf'的文件。 I want to capture part after the last underscore and before '.pdf'. 我想捕获最后一个下划线之后和'.pdf'之前的部分。 Writing regular expression like : 写正则表达式如：

    s = 'ab9c_xy8z_12a3.pdf'
    m = re.search(r'_.*?\.pdf',s)
    m.group(0)

returns: '_xy8z_12a3.pdf' 返回：'_ xin8z_12a3.pdf'

In this example, I would like to capture only '12a3' part. 在这个例子中，我想只捕获'12a3'部分。 Thank you for your help. 谢谢您的帮助。

Answer 1

The _.*?\\.pdf regex matches the first underscore with _ , then matches any 0+ chars other than a newline, as few as possible, but up to the leftmost occurrence of .pdf , which turns out to be at the end of the string. _.*?\\.pdf正则表达式将第一个下划线与_匹配，然后匹配除换行符之外的任何0 +字符，尽可能少，但最接近最后一个出现的.pdf ，结果最后的字符串。 So, . 所以， . matched all underscores on its way to .pdf , just because of the way a regex engine parses the string (from left to right) and due to . 只是因为正则表达式引擎分析字符串（从左到右）的方式，因此匹配所有下划线到.pdf . pattern. 图案。

You may fix the pattern by using a negated character class [^_] instead of . 您可以通过使用否定字符类 [^_]而不是使用来修复模式. that will "subtract" underscores from . 这将“减去”下划线. pattern. 图案。

([^_]+)\.pdf

and grab Group 1 value. 并获取Group 1值。 See the regex demo . 请参阅正则表达式演示。

Python demo : Python演示：

import re
rx = r"([^_]+)\.pdf"
s = "ab9c_xy8z_12a3.pdf"
m = re.search(rx, s)
if m:
    print(m.group(1)) # => 12a3

Answer 2

Use re.split instead: 请改用re.split ：

>>> re.split('[_.]', 'ab9c_xy8z_12a3.pdf')[-2]
'12a3'

如何从右边创建非贪婪的正则表达式？

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-04-02 20:07:51

解决方案2
1 2018-04-02 18:59:03

如何从右边创建非贪婪的正则表达式？

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-04-02 20:07:51

解决方案2 1 2018-04-02 18:59:03

解决方案1
2 已采纳 2018-04-02 20:07:51

解决方案2
1 2018-04-02 18:59:03