[英]How to modify this regular expression to extract strings with this pattern?
I'm trying to extract the string that are between the quotation mark "
and .pdf
. For example, "../matlab/license_admin.pdf" abc "vfv"
-> ../matlab/license_admin.pdf
and "license_admin.pdf" xyz'
-> license_admin.pdf
. I try the following code:我正在尝试提取引号"
和.pdf
之间的字符串。例如, "../matlab/license_admin.pdf" abc "vfv"
../matlab/license_admin.pdf
/ "license_admin.pdf" xyz'
"../matlab/license_admin.pdf" abc "vfv"
-> ../matlab/license_admin.pdf
和"license_admin.pdf" xyz'
-> license_admin.pdf
。我尝试以下代码:
import re
base = '"../matlab/license_admin.pdf" abc "vfv"'
base1 = '"license_admin.pdf" xyz'
result = re.findall(r'\b(\S+\.pdf)\b', base)
result1 = re.findall(r'\b(\S+\.pdf)\b', base1)
print(result)
print(result1)
but it only works with the my second example.但它只适用于我的第二个例子。 The code remove ../
in my first one:代码删除../
在我的第一个:
Could you please help me modify the regular expression \\b(\\S+\\.pdf)\\b
to achieve my goal?你能帮我修改正则表达式\\b(\\S+\\.pdf)\\b
来实现我的目标吗? Thank you so much!非常感谢!
Use用
import re
bases = ['"../matlab/license_admin.pdf" abc "vfv"', '"license_admin.pdf" xyz']
for base in bases:
m = re.search(r'"(.*?\.pdf)', base)
if m:
print(m.group(1))
See the Python demo查看Python 演示
Output:输出:
../matlab/license_admin.pdf
license_admin.pdf
The "(.*?\\.pdf)
pattern matches "
, then captures into Group 1 any 0 or more chars but line break chars, as few as possible, and then .pdf
. "(.*?\\.pdf)
模式匹配"
,然后将任何 0 或更多字符捕获到组 1 中,但换行符字符尽可能少,然后是.pdf
。 With re.search
, you get the first match, and m.group(1)
acccesses the Group 1 value.使用re.search
,您将获得第一个匹配项,并且m.group(1)
访问 Group 1 值。
See the regex demo .请参阅正则表达式演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.