Python Regex：如何提取括号和引号之间的字符串（如果存在）

Question

我试图在括号和引号之间提取Jenkinsfiles中每个触发器的值/参数（如果存在）。

例如，给出以下内容：

upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')     # single quotes and parentheses

所需结果分别为：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

我目前的结果：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *'        # Notice the trailing single quote

到目前为止，我已经成功使用了第一个触发器（上游触发器），但是没有成功使用第二个触发器（pollSCM），因为仍然有尾随的单引号。

在组(.+) ，它不使用\\'*捕获尾随单引号，但它使用\\)捕获右括号。 我可以简单地使用.replace（）或.strip（）删除它，但是我的正则表达式模式出了什么问题？ 我该如何改善？ 这是我的代码：

pattern = r"[A-Za-z]*\(\'*\"*(.+)\'*\"*\)"
text1 = r"upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)"
text2 = r"pollSCM('H * * * *')"
trigger_value1 = re.search(pattern, text1).group(1)
trigger_value2 = re.search(pattern, text2).group(1)

Answer 1

import re
s = """upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')"""
print(re.findall("\((.*?)\)", s))

输出：

["upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS", "'H * * * *'"]

Answer 2

您的\\'*部分表示您的单个刻度的0 or more matches ，因此.+会抓住最后一个'因为它很贪婪。 您需要添加? (.+) ，以免贪婪。 从根本上讲，它意味着抓住一切直到碰到' 。

此模式将为您工作： [A-Za-z]*\\(\\'*\\"*(.+?)\\'*\\"*\\)

[UPDATE]

要在下面回答您的问题，请在此处添加。

So the ? will make it not greedy up until the next character indicated in the pattern?

是的，它基本上将重复运算符更改为不贪婪（惰性量词），因为默认情况下它们是贪婪的。 所以.*?a会匹配所有内容，直到头a时间.*a会匹配所有内容，包括字符串中找到的所有a ，直到不再与字符串匹配为止。 因此，如果您的字符串是aaaaaaaa而正则表达式是.*?a则它实际上将与每个a匹配。 例如，如果在字符串aaaaaaaa上的每个匹配项中使用.*?a并用b替换b ，您将获得字符串bbbbbbbb 。 .*a但是在字符串aaaaaaaa具有相同的替换将返回单个b 。

这是一个说明不同量词类型（贪婪，懒惰，所有格）的链接： http : //www.rexegg.com/regex-quantifiers.html

Answer 3

对于您的示例数据，您可以使'可选'? 并在组中捕获您的值，然后遍历捕获的组。

\\('?(.*?)'?\\)

test_str = ("upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses\n"
    "pollSCM('H * * * *')     # single quotes and parentheses")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1  
        print (match.group(groupNum))

演示Python

那会给你：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

为了获得更严格的匹配，您可以使用交替来匹配()或('')但不能与单个' like ('H * * * *)匹配，然后在捕获的组之间循环。 因为您现在捕获了2个组，其中2个组中的1个为空，所以可以检查您是否仅检索了一个非空组。

\\((?:'(.*?)'|([^'].*?[^']))\\)

演示Python

Python Regex：如何提取括号和引号之间的字符串（如果存在）

问题描述

3 个解决方案

解决方案1
2 2018-05-04 04:36:15

解决方案2
0 2018-05-04 04:44:27

解决方案3
0 2018-05-04 09:00:11

Python Regex：如何提取括号和引号之间的字符串（如果存在）

问题描述

3 个解决方案

解决方案1 2 2018-05-04 04:36:15

解决方案2 0 2018-05-04 04:44:27

解决方案3 0 2018-05-04 09:00:11

解决方案1
2 2018-05-04 04:36:15

解决方案2
0 2018-05-04 04:44:27

解决方案3
0 2018-05-04 09:00:11