[英]How to create non-greedy regular expression from right?
I have a file named 'ab9c_xy8z_12a3.pdf' . 我有一个名为'ab9c_xy8z_12a3.pdf'的文件。 I want to capture part after the last underscore and before '.pdf'. 我想捕获最后一个下划线之后和'.pdf'之前的部分。 Writing regular expression like : 写正则表达式如:
s = 'ab9c_xy8z_12a3.pdf'
m = re.search(r'_.*?\.pdf',s)
m.group(0)
returns: '_xy8z_12a3.pdf' 返回:'_ xin8z_12a3.pdf'
In this example, I would like to capture only '12a3' part. 在这个例子中,我想只捕获'12a3'部分。 Thank you for your help. 谢谢您的帮助。
The _.*?\\.pdf
regex matches the first underscore with _
, then matches any 0+ chars other than a newline, as few as possible, but up to the leftmost occurrence of .pdf
, which turns out to be at the end of the string. _.*?\\.pdf
正则表达式将第一个下划线与_
匹配,然后匹配除换行符之外的任何0 +字符,尽可能少,但最接近最后一个出现的.pdf
,结果最后的字符串。 So, .
所以, .
matched all underscores on its way to .pdf
, just because of the way a regex engine parses the string (from left to right) and due to .
只是因为正则表达式引擎分析字符串(从左到右)的方式,因此匹配所有下划线到.pdf
.
pattern. 图案。
You may fix the pattern by using a negated character class [^_]
instead of .
您可以通过使用否定字符类 [^_]
而不是使用来修复模式.
that will "subtract" underscores from .
这将“减去”下划线.
pattern. 图案。
([^_]+)\.pdf
and grab Group 1 value. 并获取Group 1值。 See the regex demo . 请参阅正则表达式演示 。
Python demo : Python演示 :
import re
rx = r"([^_]+)\.pdf"
s = "ab9c_xy8z_12a3.pdf"
m = re.search(rx, s)
if m:
print(m.group(1)) # => 12a3
Use re.split
instead: 请改用re.split
:
>>> re.split('[_.]', 'ab9c_xy8z_12a3.pdf')[-2]
'12a3'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.