简体   繁体   中英

Splitting a list by matching a regex to an element

I have a list that has some specific elements in it. I would like to split that list into 'sublists' or different lists based on those elements. For example:

test_list = ['a and b, 123','1','2','x','y','Foo and Bar, gibberish','123','321','June','July','August','Bonnie and Clyde, foobar','today','tomorrow','yesterday']

I would like to split into sublists if an element matches 'something and something':

new_list = [['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

So far I can accomplish this if there is a fixed amount of items after the specific element. For example:

import re
element_regex = re.compile(r'[A-Z a-z]+ and [A-Z a-z]+')
new_list = [test_list[i:(i+4)] for i, x in enumerate(test_list) if element_regex.match(x)]

Which is almost there, but there's not always exactly three elements following the specific element of interest. Is there a better way than just looping over every single item?

If you want a one-liner,

new_list = reduce(lambda a, b: a[:-1] + [ a[-1] + [ b ] ] if not element_regex.match(b) or not a[0] else a + [ [ b ] ], test_list, [ [] ])

will do. The python way would however be to use a more verbose variant.

I did some speed measurements on a 4 core i7 @ 2.1 GHz. The timeit module ran this code 1.000.000 times and needed 11.38s for that. Using groupby from the itertools module (Kasras variant from the other answer) requires 9.92s. The fastest variant is the verbose version I suggested, taking only 5.66s:

new_list = [[]]
for i in test_list:
    if element_regex.match(i):
        new_list.append([])
    new_list[-1].append(i)

You dont need regex for that , just use itertools.groupby :

>>> from itertools import groupby
>>> from operator import add
>>> g_list=[list(g) for k,g in groupby(test_list , lambda i : 'and' in i)]
>>> [add(*g_list[i:i+2]) for i in range(0,len(g_list),2)]
[['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

first we grouping the list by this lambda function lambda i : 'and' in i that finds the elements that have "and" in it ! and then we have this :

>>> g_list
[['a and b, 123'], ['1', '2', 'x', 'y'], ['Foo and Bar, gibberish'], ['123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar'], ['today', 'tomorrow', 'yesterday']]

so then we have to concatenate the 2 pairs of lists here that we use add operator and a list comprehension !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM