如何修改此正则表达式以使用此模式提取字符串？

Question

I'm trying to extract the string that are between the quotation mark " and .pdf . For example, "../matlab/license_admin.pdf" abc "vfv" -> ../matlab/license_admin.pdf and "license_admin.pdf" xyz' -> license_admin.pdf . I try the following code:我正在尝试提取引号"和.pdf之间的字符串。例如， "../matlab/license_admin.pdf" abc "vfv" ../matlab/license_admin.pdf / "license_admin.pdf" xyz' "../matlab/license_admin.pdf" abc "vfv" -> ../matlab/license_admin.pdf和"license_admin.pdf" xyz' -> license_admin.pdf 。我尝试以下代码：

import re

base = '"../matlab/license_admin.pdf" abc "vfv"'
base1 = '"license_admin.pdf" xyz'

result = re.findall(r'\b(\S+\.pdf)\b', base)
result1 = re.findall(r'\b(\S+\.pdf)\b', base1) 

print(result)
print(result1)

but it only works with the my second example.但它只适用于我的第二个例子。 The code remove ../ in my first one:代码删除../在我的第一个：

Could you please help me modify the regular expression \\b(\\S+\\.pdf)\\b to achieve my goal?你能帮我修改正则表达式\\b(\\S+\\.pdf)\\b来实现我的目标吗？ Thank you so much!非常感谢！

Answer 1

Use用

import re

bases = ['"../matlab/license_admin.pdf" abc "vfv"', '"license_admin.pdf" xyz']
for base in bases:
    m = re.search(r'"(.*?\.pdf)', base)
    if m:
        print(m.group(1))

See the Python demo查看Python 演示

Output:输出：

../matlab/license_admin.pdf
license_admin.pdf

The "(.*?\\.pdf) pattern matches " , then captures into Group 1 any 0 or more chars but line break chars, as few as possible, and then .pdf . "(.*?\\.pdf)模式匹配" ，然后将任何 0 或更多字符捕获到组 1 中，但换行符字符尽可能少，然后是.pdf 。 With re.search , you get the first match, and m.group(1) acccesses the Group 1 value.使用re.search ，您将获得第一个匹配项，并且m.group(1)访问 Group 1 值。

See the regex demo .请参阅正则表达式演示。

如何修改此正则表达式以使用此模式提取字符串？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-21 18:45:34

如何修改此正则表达式以使用此模式提取字符串？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-21 18:45:34

解决方案1
1 已采纳 2020-03-21 18:45:34