简体   繁体   English

从numpy数组中删除一些元素

[英]Delete some elements from numpy array

One interesting question:一个有趣的问题:

I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element.我想从 numpy 数组中删除一些元素,但就像下面简化的示例代码一样,如果不删除最后一个元素,它会起作用,但如果我们希望删除最后一个元素,它会失败。 Below code works fine:下面的代码工作正常:

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
    values = np.delete(values,i)
print values

The output is:输出是:

[0 1 2 3 4 5]
[0 2 4]

If we only change 4 to 5, then it will fail:如果我们只将 4 改为 5,那么它会失败:

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
    values = np.delete(values,i)
print values

The error message:错误信息:

IndexError: index 5 is out of bounds for axis 0 with size 5

Why this error only happen if delete the last element?为什么只有在删除最后一个元素时才会发生此错误? what's correct way to do such tasks?执行此类任务的正确方法是什么?

Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value. 请记住,np.delete(ARR,IND)删除元素在指数ind不是那个与价值。

This means that as you delete things, the array is getting shorter. 这意味着当您删除内容时,数组会变短。 So you start with 所以你开始吧

values = [0,1,2,3,4,5]
np.delete(values, 3) 
[0,1,2,4,5]  #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5) 

One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array). 解决问题的方法之一是按降序对要删除的索引进行排序(如果您确实要删除该数组)。

inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind

Or: 要么:

inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]

The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4 . 问题是您已从values删除了项目,因此当您尝试删除索引5项目时,该索引上不再有值,它现在位于索引4

If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue. 如果您对要删除的索引列表进行排序,并将它们从大到小迭代,以解决此问题。

import numpy as np

values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]:  # iterate in order
    values = np.delete(values,i)
print values

A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask: 一种可能更快的方法(因为你不需要删除每一个值,但一次全部删除)是使用布尔掩码:

values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.

It is recommended on the numpy reference on np.delete 建议在np.delete上的numpy引用中np.delete

To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete. 对于你的问题:删除一个元素,使数组变得更短,索引5不再在数组中,因为前一个索引5现在有索引4.如果你想使用np.delete,则按降序删除。

If you really want to delete with np.delete use the shorthand: 如果你真的想用np.delete删除, np.delete使用简写:

np.delete(values, [3,5,1])

If you want to delete where the values are (not the index) you have to alter the procedure a bit. 如果要删除值的位置(而不是索引),则必须稍微更改过程。 If you want to delete all values 5 in your array you can use: 如果要删除数组中的所有值5 ,可以使用:

values[values != 5]

or with multiple values to delete: 或者要删除多个值:

to_delete = (values == 5) | (values == 3)  | (values == 1)
values[~to_delete]

all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate. 所有这些都能给你想要的结果,不确定你的数据是如何真实的,所以我不能肯定哪个是最合适的。

If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]) . 如果要删除索引3,4,1的元素,只需执行np.delete(values,[3,4,1])

If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. 如果你想在第一种情况下删除第四个(index = 3)项,那么剩下的第五个,最后是剩下的第二个,由于操作的顺序,你删除第二个,第四个和第六个初始数组。 It's therefore logic that the second case fails. 因此逻辑上第二种情况失败了。

You can compute the shifts (in the exemple fifth become sixth) in this way : 你可以用这种方式计算变化(在第五个例子中变为第六个):

def multidelete(values,todelete):
   todelete=np.array(todelete)
   shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
   return np.delete(values,todelete+shift)

Some tests: 一些测试:

In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])

In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])

NB np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values . NB np.delete如果错误的指标在列表中,则不会抱怨什么都不做: np.delete(values,[8])values

Boolean index is deprected. 布尔索引被删除。 You can use function np.where() instead like this: 您可以使用函数np.where(),如下所示:

values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
    values = np.delete(values,np.where(values==i))
    # values = np.delete(values,values==i) # still works with warning
print(values)

I know this question is old, but for further reference (as I found a similar source problem): 我知道这个问题很老,但需要进一步参考(因为我发现了类似的源问题):

Instead of making a for loop, a solution is to filter the array with isin numpy's function. 而不是制作for循环,解决方案是使用isin numpy函数过滤数组。 Like so, 像这样,

>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)

>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False,  True, False,  True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])

So for this particular case we can write: 所以对于这个特例,我们可以写:

values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]

which outputs: array([0, 2, 5]) 输出: array([0, 2, 5])

here's how you can do it without any loop or any indexing, using numpy.setdiff1d<\/a>这是使用numpy.setdiff1d<\/a>无需任何循环或任何索引的方法

>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2,  4,  6,  8, 10])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM