简体   繁体   English

如何修改此正则表达式以使用此模式提取字符串?

[英]How to modify this regular expression to extract strings with this pattern?

I'm trying to extract the string that are between the quotation mark " and .pdf . For example, "../matlab/license_admin.pdf" abc "vfv" -> ../matlab/license_admin.pdf and "license_admin.pdf" xyz' -> license_admin.pdf . I try the following code:我正在尝试提取引号".pdf之间的字符串。例如, "../matlab/license_admin.pdf" abc "vfv" ../matlab/license_admin.pdf / "license_admin.pdf" xyz' "../matlab/license_admin.pdf" abc "vfv" -> ../matlab/license_admin.pdf"license_admin.pdf" xyz' -> license_admin.pdf 。我尝试以下代码:

import re

base = '"../matlab/license_admin.pdf" abc "vfv"'
base1 = '"license_admin.pdf" xyz'

result = re.findall(r'\b(\S+\.pdf)\b', base)
result1 = re.findall(r'\b(\S+\.pdf)\b', base1) 

print(result)
print(result1)

but it only works with the my second example.但它只适用于我的第二个例子。 The code remove ../ in my first one:代码删除../在我的第一个:

在此处输入图片说明

Could you please help me modify the regular expression \\b(\\S+\\.pdf)\\b to achieve my goal?你能帮我修改正则表达式\\b(\\S+\\.pdf)\\b来实现我的目标吗? Thank you so much!非常感谢!

Use

import re

bases = ['"../matlab/license_admin.pdf" abc "vfv"', '"license_admin.pdf" xyz']
for base in bases:
    m = re.search(r'"(.*?\.pdf)', base)
    if m:
        print(m.group(1))

See the Python demo查看Python 演示

Output:输出:

../matlab/license_admin.pdf
license_admin.pdf

The "(.*?\\.pdf) pattern matches " , then captures into Group 1 any 0 or more chars but line break chars, as few as possible, and then .pdf . "(.*?\\.pdf)模式匹配" ,然后将任何 0 或更多字符捕获到组 1 中,但换行符字符尽可能少,然后是.pdf With re.search , you get the first match, and m.group(1) acccesses the Group 1 value.使用re.search ,您将获得第一个匹配项,并且m.group(1)访问 Group 1 值。

See the regex demo .请参阅正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM