[英]Regex match strings divided by 'and'
我需要解析一個字符串以獲得所需的數字,然后 position 形成一個字符串,例如:
2 Better Developers and 3 Testers
5 Mechanics and chef
medic and 3 nurses
目前我正在使用這樣的代碼,它返回元組列表,例如[('2', 'Better Developers'), ('3', 'Testers')]
:
def parse_workers_list_from_str(string_value: str) -> [(str, str)]:
result: [(str, str)] = []
if string_value:
for part in string_value.split('and'):
result.append(re.findall(r'(?: *)(\d+|)(?: |)([\w ]+)', part.strip())[0])
return result
我可以在沒有.split()
的情況下僅使用正則表達式嗎?
與re.MULTILINE
一起,您可以在一個正則表達式中完成所有操作,這也將正確拆分所有內容:
>>> s = """2 Better Developers and 3 Testers
5 Mechanics and chef
medic and 3 nurses"""
>>> re.findall(r"\s*(\d*)\s*(.+?)(?:\s+and\s+|$)", s, re.MULTILINE)
[('2', 'Better Developers'), ('3', 'Testers'), ('5', 'Mechanics'), ('', 'chef'), ('', 'medic'), ('3', 'nurses')]
隨着空''
到1
的解釋和轉換:
import re
s = """2 Better Developers and 3 Testers
5 Mechanics and chef
medic and 3 nurses"""
results = re.findall(r"""
# Capture the number if one exists
(\d*)
# Remove spacing between number and text
\s*
# Caputre the text
(.+?)
# Attempt to match the word 'and' or the end of the line
(?:\s+and\s+|$\n?)
""", s, re.MULTILINE|re.VERBOSE)
results = [(int(n or 1), t.title()) for n, t in results]
results == [(2, 'Better Developers'), (3, 'Testers'), (5, 'Mechanics'), (1, 'Chef'), (1, 'Medic'), (3, 'Nurses')]
你可以使用這個正則表達式:
(\d*) *(\S+(?: \S+)*?) and (\d*) *(\S+(?: \S+)*)
在這里,我們匹配and
在兩側用一個空間包圍。 之前和之后and
我們使用這個子模式進行匹配:
(\d*) *(\S+(?: \S+)*?)
匹配可選的 0+ 位開頭,后跟 0 個或多個空格,后跟 1 個或多個由空格分隔的非空白字符串。
代碼:
import re
arr = ['2 Better Developers and 3 Testers', '5 Mechanics and chef', 'medic and 3 nurses', '5 foo']
rx = re.compile(r'(\d*) *(\S+(?: \S+)*?) and (\d*) *(\S+(?: \S+)*)')
for s in arr: print (rx.findall(s))
Output:
[('2', 'Better Developers', '3', 'Testers')]
[('5', 'Mechanics', '', 'chef')]
[('', 'medic', '3', 'nurses')]
[]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.