Python Regex：如何提取括號和引號之間的字符串（如果存在）

Question

我試圖在括號和引號之間提取Jenkinsfiles中每個觸發器的值/參數（如果存在）。

例如，給出以下內容：

upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')     # single quotes and parentheses

所需結果分別為：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

我目前的結果：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *'        # Notice the trailing single quote

到目前為止，我已經成功使用了第一個觸發器（上游觸發器），但是沒有成功使用第二個觸發器（pollSCM），因為仍然有尾隨的單引號。

在組(.+) ，它不使用\\'*捕獲尾隨單引號，但它使用\\)捕獲右括號。 我可以簡單地使用.replace（）或.strip（）刪除它，但是我的正則表達式模式出了什么問題？ 我該如何改善？ 這是我的代碼：

pattern = r"[A-Za-z]*\(\'*\"*(.+)\'*\"*\)"
text1 = r"upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)"
text2 = r"pollSCM('H * * * *')"
trigger_value1 = re.search(pattern, text1).group(1)
trigger_value2 = re.search(pattern, text2).group(1)

Answer 1

import re
s = """upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')"""
print(re.findall("\((.*?)\)", s))

輸出：

["upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS", "'H * * * *'"]

Answer 2

您的\\'*部分表示您的單個刻度的0 or more matches ，因此.+會抓住最后一個'因為它很貪婪。 您需要添加? (.+) ，以免貪婪。 從根本上講，它意味着抓住一切直到碰到' 。

此模式將為您工作： [A-Za-z]*\\(\\'*\\"*(.+?)\\'*\\"*\\)

[UPDATE]

要在下面回答您的問題，請在此處添加。

So the ? will make it not greedy up until the next character indicated in the pattern?

是的，它基本上將重復運算符更改為不貪婪（惰性量詞），因為默認情況下它們是貪婪的。 所以.*?a會匹配所有內容，直到頭a時間.*a會匹配所有內容，包括字符串中找到的所有a ，直到不再與字符串匹配為止。 因此，如果您的字符串是aaaaaaaa而正則表達式是.*?a則它實際上將與每個a匹配。 例如，如果在字符串aaaaaaaa上的每個匹配項中使用.*?a並用b替換b ，您將獲得字符串bbbbbbbb 。 .*a但是在字符串aaaaaaaa具有相同的替換將返回單個b 。

這是一個說明不同量詞類型（貪婪，懶惰，所有格）的鏈接： http : //www.rexegg.com/regex-quantifiers.html

Answer 3

對於您的示例數據，您可以使'可選'? 並在組中捕獲您的值，然后遍歷捕獲的組。

\\('?(.*?)'?\\)

test_str = ("upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses\n"
    "pollSCM('H * * * *')     # single quotes and parentheses")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1  
        print (match.group(groupNum))

演示Python

那會給你：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

為了獲得更嚴格的匹配，您可以使用交替來匹配()或('')但不能與單個' like ('H * * * *)匹配，然后在捕獲的組之間循環。 因為您現在捕獲了2個組，其中2個組中的1個為空，所以可以檢查您是否僅檢索了一個非空組。

\\((?:'(.*?)'|([^'].*?[^']))\\)

演示Python

Python Regex：如何提取括號和引號之間的字符串（如果存在）

問題描述

3 個解決方案

解決方案1
2 2018-05-04 04:36:15

解決方案2
0 2018-05-04 04:44:27

解決方案3
0 2018-05-04 09:00:11

Python Regex：如何提取括號和引號之間的字符串（如果存在）

問題描述

3 個解決方案

解決方案1 2 2018-05-04 04:36:15

解決方案2 0 2018-05-04 04:44:27

解決方案3 0 2018-05-04 09:00:11

解決方案1
2 2018-05-04 04:36:15

解決方案2
0 2018-05-04 04:44:27

解決方案3
0 2018-05-04 09:00:11