简体   繁体   中英

Index Python List with Numpy Boolean Array

Is there a way to index a python list like x = ['a','b','c'] using a numpy boolean array? I'm currently getting the following error: TypeError: only integer arrays with one element can be converted to an index

Indexing via [] secretly calls the object's __getitem__ method. For objects implemented in pure Python, you can simply overwrite this method with any function which suits your needs. Lists however are implemented in C, so you are not allowed to replace list.__getitem__ . Therefore there are no direct way to do what you request.

You can however make a NumPy array out of your list and then do NumPy-style boolean indexing on that:

import numpy as np

x = ['a', 'b', 'c']

mask = np.array([True, False, True])
x_arr = np.asarray(x, dtype=object)
output = x_arr[mask]  # Get items
x_arr[mask] = ['new', 'values']  # Set items

Unfortunately, np.asarray cannot simply make a view over your list, so the list is simply copied. This means that the original x is unchanged when assigning new values to the elements of x_arr .

If you really want the full power of NumPy boolean indexing on lists, you have to write a function which does this from scratch, and you will not be able to use the [] indexing syntax.

In [304]: ['a','b','c'][[2,1,0]]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-304-c04b1f0621a3> in <module>()
----> 1 ['a','b','c'][[2,1,0]]

TypeError: list indices must be integers or slices, not list

List comprehension route

In [306]: [i for i,j in zip(['a','b','c'],[True, False, True]) if j]
Out[306]: ['a', 'c']

Array route

In [308]: np.array(['a','b','c'])[np.array([True, False, True])]
Out[308]: 
array(['a', 'c'], 
      dtype='<U1')

back to list:

In [309]: np.array(['a','b','c'])[np.array([True, False, True])].tolist()
Out[309]: ['a', 'c']

but be careful if your list contains objects, as opposed to numbers or strings. This might not preserve the links.

The operator module has a itemgetter function

In [321]: operator.itemgetter(*[2,0,1])(list('abc'))
Out[321]: ('c', 'a', 'b')

But under the covers it is just a list comprehension like iterator. And I don't offhand see a boolean version.

map(x.__getitem__,np.where(mask)[0])

Or if you want list comprehension

[x[i] for i in np.where(mask)[0]]

This keeps you from having to iterate over the whole list, especially if mask is sparse.

Do you need it to be a list. Since you want to use the indexing behavior of a numpy array, it would make better sense to other people that read your code if you actually use a numpy array.

Maybe try using an array with dtype='a'? For example in the code below,

x = sp.array(['a', 'b', 'c'], dtype='a')
print(x)
print(x=='c')
print(x[x=='c']).

This will return the following arrays,

['a' 'b' 'c']
[False False  True]
['c'].

Assignment will work as you would expect too,

x[x=='c'] = 'z'
print(x).  

This will return the modified array,

['a' 'b' 'z'].

The only concern is that the elements of the array cannot be longer than the allocated length. Here it is specified as one with dtype='a'. You can use dtype='a5' or dtype='aN' for any length you want. All the elements of the array must be strings that are shorter than the maximum length.

If you pass a string that is too long it will chop off the end, as in the following example with dtype set to 'a2':

x = sp.array(['abc', 'bcd', 'cde'], dtype='a2')
print(x), 

which gives,

['ab' 'bc' 'cd'].

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM