简体   繁体   中英

Split string with Regex on variable pattern

I'm going crazy over a fairly simple problem: I have a list of list that i want to split. There's a fairly easy pattern, but with a variation that I can't seem to capture:

 [['XXOOY00 100.00–200.00 300.000 -1.000 XX0IY00 300.00–400.00 500.000 +10.000 XX2IY00 600.00–700.00 800.00 0.000'],
['XXOOY00 100.00–200.00 300.000 -1.000 XX0IY00 300.00–400.00 500.000 XX2IY00 600.00–700.00 800.00 0.000']]

The general pattern in the list éléments is code, range, value, change. As you can see there's a variation in the pattern in the second list, because the second element only had code,range, value. In order to split these lists, I use this regex:

for element in list:
    final_list.append(re.split('([A-Z]{2}[A-Z0-9]{1}[A-Z]{2}[A-Z0-9]{2}\s\S*\s\S*\s\S*)\s', element))

However, this fails on the second list because I have:

[['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 +10.000, XX2IY00 600.00–700.00 800.00 0.000'],
['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 XX2IY00, 600.00–700.00 800.00 0.000']]

While the expected result is:

[['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000 +10.000, XX2IY00 600.00–700.00 800.00 0.000'],
    ['XXOOY00 100.00–200.00 300.000 -1.000, XX0IY00 300.00–400.00 500.000, XX2IY00 600.00–700.00 800.00 0.000']]

Which regex patter would allow me to do this?

To clarify: I want a list so that if I read the content of the list element wise I have:

XXOOY00 100.00–200.00 300.000 -1.000
XX0IY00 300.00–400.00 500.000 +10.000
XX2IY00 600.00–700.00 800.00 0.000
XXOOY00 100.00–200.00 300.000 -1.000
XX0IY00 300.00–400.00 500.000
XX2IY00 600.00–700.00 800.00 0.000

Thank you.

You could use the fact that your optional 'change' field is built of digits, +, - and decimal point, wich can be expressed in a re pattern as: \\s[0-9\\+\\-\\.]+ (include the preceding space is convenient)

Now you want one or zero occurences of this pattern: (\\s[0-9\\+\\-\\.]+)?

This needs grouping (parentheses) but you don not want that group captured and added separately in your resulting list. So you must make it a non-capturing group: (?:\\s[0-9\\+\\-\\.]+)?

A complete pattern could be:

'([AZ]{2}[A-Z0-9][AZ]{2}[A-Z0-9]{2}\\s\\S+\\s\\S+(?:\\s[0-9\\+\\-\\.]+)?)'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM