简体   繁体   中英

Python: using a dict to speed sorting of a list of tuples

For some reason, I keep having 'how do I sort this list of tuples' questions. (A prior question of mine: sorting list of tuples by arbitrary key ).

Here is some arbitrary raw input:

number_of = 3  # or whatever
tuple_list = [(n, 'a', 'b', 'c') for n in xrange(number_of)]  # [(0, 'a', 'b', 'c')...]
ordering_list = random.sample(range(number_of), number_of)  # e.g. [1, 0, 2]

Sorting tuple_list by ordering_list using sorted:

ordered = sorted(tuple_list, key=lambda t: ordering_list.index(t[0]))
# ordered = [(1, 'a', 'b', 'c'), (0, 'a', 'b', 'c'), (2, 'a', 'b', 'c')]

I have a slightly awkward approach which seems to be much faster, especially as the number of elements in the tuple_list grows. I create a dictionary, breaking the tuple into (tuple[0], tuple[1:]) items inside dictionary list_dict . I retrieve the dictionary item using ordering_list as keys, and then re-assemble the sequence of (tuple[0], tuple[1:]) into a list of tuples, using an idiom I'm still trying to wrap my head around completely: zip(*[iter(_list)] * x) where x is the length of each tuple composed of items from _list . So my question is: is there a version of this approach which is manages the disassemble - reassemble part of the code better?

def gen_key_then_values(key_list, list_dict):
    for key in key_list:
        values = list_dict[key]
        yield key

        for n in values:
            yield n

list_dict = {t[0]: t[1:] for t in tuple_list}
ordered = zip(*[gen_key_then_values(ordering_list, list_dict)] * 4)

NOTE BETTER CODE, using an obvious comment from Steve Jessop below:

list_dict = {t[0]: t for t in tuple_list}
ordered = [list_dict[k] for k in ordering_list]

My actual project code still requires assembling a tuple for each (k, ['a', 'b' ...]) item retrieved from the list_dict but there was no reason for me to include that part of the code here.

Breaking the elements of tuple_list apart in the dictionary doesn't really gain you anything and requires creating a bunch more tuples for the values. All you're doing is looking up elements in the list according to their first element, so it's probably not worth actually splitting them:

list_dict = { t[0] : t for t in tuple_list }

Note that this only works if the first element is unique, but then the ordering_list only makes sense if the first element is unique, so that's probably OK.

zip(*[iter(_list)] * 4) is just a way of grouping _list into fours, so give it a suitable name and you won't have to worry about it:

def fixed_size_groups(n, iterable):
    return zip(*[iter(iterable)] * n)

But all things considered you don't actually need it anyway:

ordered = list(list_dict[val] for val in ordering_list)

The reason your first code is slow, is that ordering_list.index is slow -- it searches through the ordering_list for t[0] , and it does this once for each t . So in total it does (number_of ** 2) / 2 inspections of a list element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM