简体   繁体   中英

Delete some elements from numpy array

One interesting question:

I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element. Below code works fine:

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
    values = np.delete(values,i)
print values

The output is:

[0 1 2 3 4 5]
[0 2 4]

If we only change 4 to 5, then it will fail:

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
    values = np.delete(values,i)
print values

The error message:

IndexError: index 5 is out of bounds for axis 0 with size 5

Why this error only happen if delete the last element? what's correct way to do such tasks?

Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value.

This means that as you delete things, the array is getting shorter. So you start with

values = [0,1,2,3,4,5]
np.delete(values, 3) 
[0,1,2,4,5]  #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5) 

One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array).

inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind

Or:

inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]

The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4 .

If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue.

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]:  # iterate in order
    values = np.delete(values,i)
print values

A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask:

values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.

It is recommended on the numpy reference on np.delete

To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete.

If you really want to delete with np.delete use the shorthand:

np.delete(values, [3,5,1])

If you want to delete where the values are (not the index) you have to alter the procedure a bit. If you want to delete all values 5 in your array you can use:

values[values != 5]

or with multiple values to delete:

to_delete = (values == 5) | (values == 3)  | (values == 1)
values[~to_delete]

all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate.

If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]) .

If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. It's therefore logic that the second case fails.

You can compute the shifts (in the exemple fifth become sixth) in this way :

def multidelete(values,todelete):
   todelete=np.array(todelete)
   shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
   return np.delete(values,todelete+shift)

Some tests:

In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])

In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])

NB np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values .

Boolean index is deprected. You can use function np.where() instead like this:

values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
    values = np.delete(values,np.where(values==i))
    # values = np.delete(values,values==i) # still works with warning
print(values)

I know this question is old, but for further reference (as I found a similar source problem):

Instead of making a for loop, a solution is to filter the array with isin numpy's function. Like so,

>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)

>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False,  True, False,  True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])

So for this particular case we can write:

values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]

which outputs: array([0, 2, 5])

here's how you can do it without any loop or any indexing, using numpy.setdiff1d<\/a>

>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2,  4,  6,  8, 10])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM