简体   繁体   English

Python通过从正则表达式模式拆分来创建字符串元组列表

[英]Python create list of tuples of strings by splitting from regex pattern

Supose I've got this two strings:假设我有这两个字符串:

s1 = 'hello 4, this is stackoverflow, looking for help (1345-today is wednesday)'
s2 = 'hello again, this is a (bit-more complicated), string (67890123 - tomorrow is thursday)'

I want to use regex to match the pattern (number-words) and then split the strings to get a list of tuples:我想使用正则表达式来匹配模式(number-words) ,然后拆分字符串以获取元组列表:

final = [('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday'),
         ('hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday')]

I tried with \\([0-9]+-(.*?)\\) but without success.我试过\\([0-9]+-(.*?)\\)但没有成功。

What am I doing wrong?我究竟做错了什么? Any idea to get a workaround?有什么想法可以解决吗?

Thank you in advance!!先感谢您!!

This might nudge you in the right direction:这可能会推动您朝着正确的方向前进:

>>> re.findall(r'^(.*) \((.+?)\)$', s1)
[('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday')]

You may use this regex in findall :您可以在findall使用此正则表达式:

>>> regx = re.compile(r'^(.*?)\s*\((\d+\s*-\s*\w+[^)]*)\)')
>>> arr = ['hello 4, this is stackoverflow, looking for help (1345-today is wednesday)', 'hello again, this is a (bit-more complicated), string (67890123 - tomorrow is thursday)']
>>> for el in arr:
...     regx.findall(el)
...
[('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday')]
[('hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday')]

RegEx Details:正则表达式详情:

  • ^(.*?) : Match 0 or more characters at the start in group #1 ^(.*?) : 匹配第 1 组开头的 0 个或多个字符
  • \\s* : Match 0 or more whitespaces \\s* : 匹配 0 个或多个空格
  • \\((\\d+\\s*-\\s*\\w+[^)]*)\\) : Match (<number>-word ..) string and capture what is inside brackets in capture group #2 \\((\\d+\\s*-\\s*\\w+[^)]*)\\) :匹配(<number>-word ..)字符串并捕获捕获组 #2 中括号内的内容

Alternatively , you may use this regex in split :或者,您可以在split使用此正则表达式:

>>> import re
>>> reg = re.compile(r'(?<!\s)\s*(?=\((\d+\s*-\s*\w+[^)]*)\))')
>>> for el in arr:
...     reg.split(el)[:-1]
...
['hello 4, this is stackoverflow, looking for help', '1345-today is wednesday']
['hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday']

RegEx Demo正则表达式演示

RegEx Details:正则表达式详情:

  • (?<!\\s) : If we don't have a whitespace at previous position (?<!\\s) : 如果我们之前的位置没有空格
  • \\s* : Match 0+ whitespaces \\s* : 匹配 0+ 个空格
  • (?=\\((\\d+\\s*-\\s*\\w+[^)]*)\\)) : Lookahead to assert a string ahead of us which is (<number>-word ..) . (?=\\((\\d+\\s*-\\s*\\w+[^)]*)\\)) :先行声明我们前面的一个字符串,即(<number>-word ..) Note that we are using a capture group to get string inside (...) in the result of split .请注意,我们使用捕获组在split的结果中获取(...)内的字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM