I'm going crazy over a fairly simple problem: I have a list of list that i want to split. There's a fairly easy pattern, but with a variation that I can't seem to capture:
[['XXOOY00 100.00–200.00 300.000 -1.000 XX0IY00 300.00–400.00 500.000 +10.000 XX2IY00 600.00–700.00 800.00 0.000'],
['XXOOY00 100.00–200.00 300.000 -1.000 XX0IY00 300.00–400.00 500.000 XX2IY00 600.00–700.00 800.00 0.000']]
The general pattern in the list éléments is code, range, value, change. As you can see there's a variation in the pattern in the second list, because the second element only had code,range, value. In order to split these lists, I use this regex:
for element in list:
final_list.append(re.split('([A-Z]{2}[A-Z0-9]{1}[A-Z]{2}[A-Z0-9]{2}\s\S*\s\S*\s\S*)\s', element))
However, this fails on the second list because I have:
[['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 +10.000, XX2IY00 600.00–700.00 800.00 0.000'],
['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 XX2IY00, 600.00–700.00 800.00 0.000']]
While the expected result is:
[['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 +10.000, XX2IY00 600.00–700.00 800.00 0.000'],
['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000, XX2IY00 600.00–700.00 800.00 0.000']]
Which regex patter would allow me to do this?
To clarify: I want a list so that if I read the content of the list element wise I have:
XXOOY00 100.00–200.00 300.000 -1.000
XX0IY00 300.00–400.00 500.000 +10.000
XX2IY00 600.00–700.00 800.00 0.000
XXOOY00 100.00–200.00 300.000 -1.000
XX0IY00 300.00–400.00 500.000
XX2IY00 600.00–700.00 800.00 0.000
Thank you.
You could use the fact that your optional 'change' field is built of digits, +, - and decimal point, wich can be expressed in a re pattern as: \\s[0-9\\+\\-\\.]+
(include the preceding space is convenient)
Now you want one or zero occurences of this pattern: (\\s[0-9\\+\\-\\.]+)?
This needs grouping (parentheses) but you don not want that group captured and added separately in your resulting list. So you must make it a non-capturing group: (?:\\s[0-9\\+\\-\\.]+)?
A complete pattern could be:
'([AZ]{2}[A-Z0-9][AZ]{2}[A-Z0-9]{2}\\s\\S+\\s\\S+(?:\\s[0-9\\+\\-\\.]+)?)'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.