简体   繁体   中英

Is this the most efficient way to vertically slice a list of dictionaries for unique values?

I've got a list of dictionaries, and I'm looking for a unique list of values for one of the keys.

This is what I came up with, but can't help but wonder if its efficient, time and/or memory wise:

list(set([d['key'] for d in my_list]))

Is there a better way?

This:

list(set([d['key'] for d in my_list]))

… constructs a list of all values, then constructs a set of just the unique values, then constructs a list out of the set.

Let's say you had 10000 items, of which 1000 are unique. You've reduced final storage from 10000 items to 1000, which is great—but you've increased peak storage from 10000 to 11000 (because there clearly has to be a time when the entire list and almost the entire set are both in memory simultaneously).

There are two very simple ways to avoid this.

First (as long as you've got Python 2.4 or later) use a generator expression instead of a list comprehension. In most cases, including this one, that's just a matter of removing the square brackets or turning them into parentheses:

list(set(d['key'] for d in my_list))

Or, even more simply (with Python 2.7 or later), just construct the set directly by using a set comprehension instead of a list comprehension:

list({d['key'] for d in my_list})

If you're stuck with Python 2.3 or earlier, you'll have to write an explicit loop. And with 2.2 or earlier, there are no sets, so you'll have to fake it with a dict mapping each key to None or similar.


Beyond space, what about time? Well, clearly you have to traverse the entire list of 10000 dictionaries, and do an O(1) dict.get for each one.

The original version does a list.append (actually a slightly faster internal equivalent) for each of those steps, and then the set conversion is a traversal of a list of the same size with a set.add for each one, and then the list conversion is a traversal of a smaller set with a list.append for each one. So, it's O(N), which is clearly optimal algorithmically, and only worse by a smallish multiplier than just iterating the list and doing nothing.

The set version skips over the list.append s, and only iterates once instead of twice. So, it's also O(N), but with an even smaller multiplier. And the savings in memory management (if N is big enough to matter) may help as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM