简体   繁体   中英

Parsing strings and integers(w/sets) from a string to a list using python

I want to take a string that looks like this: 'Oreo.12.37-40.Apple.78' and turn it into a list that looks like this:

['Oreo', 12, 37, 38, 39, 40, 'Apple', 78]

Where the '-' causes iteration between the values. ('37-40' becomes [37,38,39,40])

https://stackoverflow.com/a/5705014/1099935 has a really nice solution, but I don't understand it enough to know how to incorporate string handling. (It works great with number only, but fails with strings in the int())

comments are locked for me(?), so here is an additional comment: I need the list to contain either int values or strings. Then using Q objects in the filters items can be filter by common name or the assigned product key (generally short and also used commonly)

Here is an option:

def custom_split(s):
    def int_range_expand(s):
        try:
            return [int(s)]
        except ValueError:
            try:
                start, end = map(int, s.split('-'))
                return range(start, end+1)
            except Exception:
                pass
        return [s]
    return sum(map(int_range_expand, s.split('.')), [])

>>> custom_split('Oreo.12.37-40.Apple.78')
['Oreo', 12, 37, 38, 39, 40, 'Apple', 78]

This uses an EAFP approach, with the steps broken down below:

1. custom_split('Oreo.12.37-40.Apple.78')
2. s <= 'Oreo.12.37-40.Apple.78'
3. s.split('.') => ['Oreo', '12', '37-40', 'Apple', '78']
4. map(int_range_expand, ['Oreo', '12', '37-40', 'Apple', '78'])
       => [['Oreo'], [12], [37, 38, 39, 40], ['Apple'], [78]]
5. sum([['Oreo'], [12], [37, 38, 39, 40], ['Apple'], [78]], [])
       => ['Oreo', 12, 37, 38, 39, 40, 'Apple', 78]

The int_range_expand() function from step 4 always returns a list. If the argument is a string or an int the result will only have one element, but if it is a range like 37-40 then it will contain each integer in that range. This allows us to chain all of the resulting lists into a single list easily.

Step 5 is similar to itertools.chain , which is more efficient but requires importing a module, up to you which is preferable.

This does it:

#!/usr/bin/python

import re

s = 'Oreo.12.37-40.Apple.78'
l=[]
for e in re.split('[^\w-]+',s):
    m=re.match('(\d+)-(\d+)',e)
    if m:
       x=int(m.group(1))
       y=int(m.group(2))
       for i in range(x,y+1):
          l.append(i)   
    else:       
       try:
          l.append(int(e))
       except ValueError:
          l.append(e)

print l  

Output:

['Oreo', 12, 37, 38, 39, 40, 'Apple', 78]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM