Python create list of tuples of strings by splitting from regex pattern

Question

Supose I've got this two strings:

s1 = 'hello 4, this is stackoverflow, looking for help (1345-today is wednesday)'
s2 = 'hello again, this is a (bit-more complicated), string (67890123 - tomorrow is thursday)'

I want to use regex to match the pattern (number-words) and then split the strings to get a list of tuples:

final = [('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday'),
         ('hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday')]

I tried with \\([0-9]+-(.*?)\\) but without success.

What am I doing wrong? Any idea to get a workaround?

Thank you in advance!!

Answer 1

This might nudge you in the right direction:

>>> re.findall(r'^(.*) \((.+?)\)$', s1)
[('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday')]

Answer 2

You may use this regex in findall :

>>> regx = re.compile(r'^(.*?)\s*\((\d+\s*-\s*\w+[^)]*)\)')
>>> arr = ['hello 4, this is stackoverflow, looking for help (1345-today is wednesday)', 'hello again, this is a (bit-more complicated), string (67890123 - tomorrow is thursday)']
>>> for el in arr:
...     regx.findall(el)
...
[('hello 4, this is stackoverflow, looking for help', '1345-today is wednesday')]
[('hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday')]

RegEx Details:

^(.*?) : Match 0 or more characters at the start in group #1
\\s* : Match 0 or more whitespaces
\\((\\d+\\s*-\\s*\\w+[^)]*)\\) : Match (<number>-word ..) string and capture what is inside brackets in capture group #2

Alternatively , you may use this regex in split :

>>> import re
>>> reg = re.compile(r'(?<!\s)\s*(?=\((\d+\s*-\s*\w+[^)]*)\))')
>>> for el in arr:
...     reg.split(el)[:-1]
...
['hello 4, this is stackoverflow, looking for help', '1345-today is wednesday']
['hello again, this is a (bit-more complicated), string', '67890123 - tomorrow is thursday']

RegEx Demo

RegEx Details:

(?<!\\s) : If we don't have a whitespace at previous position
\\s* : Match 0+ whitespaces
(?=\\((\\d+\\s*-\\s*\\w+[^)]*)\\)) : Lookahead to assert a string ahead of us which is (<number>-word ..) . Note that we are using a capture group to get string inside (...) in the result of split .

Python create list of tuples of strings by splitting from regex pattern

Question

2 answers

solution1
0 2020-10-28 16:06:47

solution2
0 ACCPTED 2020-10-28 16:14:10

Python create list of tuples of strings by splitting from regex pattern

Question

2 answers

solution1 0 2020-10-28 16:06:47

solution2 0 ACCPTED 2020-10-28 16:14:10

solution1
0 2020-10-28 16:06:47

solution2
0 ACCPTED 2020-10-28 16:14:10