I have the following data:
names = ['foo','bar','baz', 'spam', 'ham', 'jam']
indices =[0,2,3,4]
size = 3
and want to create a list of the names which index is in indices
. The list must have the size specified in the variable size
.
I could not achieve it by doing this (wrong length):
selected_names = []
selected_names = [names[i] for i in indices if len(selected_names) <= size]
# Out[5]: ['foo', 'baz', 'spam', 'ham']
and I don't like this solution because declaring the empty list at the beginning is not elegant.
I can do this:
selected_names = [names[i] for x,i in enumerate(indices) if x <= size]
but that's a bit unreadable and the list length is still wrong.
Is there a correct and more beautiful way to create that list? maybe something like this?
#pseudo code
selected_names = [names[i] for i in indices if list_current_index < size]
enumerate
wouldn't even solve this since it would cause you to stop when you'd pulled size
elements, not when you'd kept size
elements. The only reason it seems to work is that you use a test for <= size
(which actually keeps size + 1
elements), and your indices
happens to be one element larger than size
. If indices
was larger, or size
smaller, your test wouldn't work as intended.
If the goal is to keep size
elements, without processing more elements than needed, then the simplest approach (assuming you don't mind slicing to create a small intermediate list
, which is usually okay) is just:
selected_names = [names[i] for i in indices[:size]]
If indices
and size
are huge, you can use itertools.islice
with a generator expression to avoid the intermediate slice, using less memory, but somewhat more CPU:
import itertools
selected_names = [names[i] for i in itertools.islice(indices, size)]
The fastest option I can find, avoiding any explicit looping at all, is using the operator
module , though it involves temporaries for argument passing, which is probably a bad idea if size
is ever going to be huge (10s of thousands and up):
import operator
selected_names = operator.itemgetter(*indices[:size])(names)
This creates an itemgetter
callable that will look up the first size
elements from indices
, then immediately calls it on names
, returning a tuple
of all the values (wrap the itemgetter
call in list
if you need a mutable list
result instead of a tuple
). It also avoids all Python level loops in CPython; a loop still occurs at the C layer in CPython, but a loop at the C layer runs a lot faster than any loop at the Python layer. For simple ipython
%timeit
tests, the operator.itemgetter
approach won, taking ~24% less time than slice + list
comprehension (which in turn was about 9% faster than islice
+ list
comprehension). For larger inputs (I just multiplied indices
and size
by 100), operator.itemgetter
wins by a factor of 3x (slice still beats islice
, but by a meaningless margin; the overhead in islice
is mostly in setup, and doesn't increase meaningfully as the number of items sliced goes up).
All are equivalent to:
selected_names = [names[i] for i in indices][:size]
except they don't populate the complete list
first, then cut it down to size
; they get enough entries and stop immediately.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.