[英]how to remove duplicates including strings in numpy array?
I want to remove duplicates from numpy array. 我想从numpy数组中删除重复项。
>data = np.array([[1,"a",3,3,4],
[1,8,9,9,4],
[1,"a",3,3,4]])
>new_array = [tuple(row) for row in data]
>uniques = np.unique(new_array)
>uniques
output: array(['1', '3', '4', '8', '9', 'a'], dtype='<U1')
But what I want is 但是我想要的是
output: np.array([[1,"a",3,3,4],[1,8,9,9,4]])
How can I do this?Thanks. 我该怎么办?谢谢。
Use vstack
: 使用vstack
:
print(np.vstack({tuple(row) for row in data}))
Output: 输出:
[['1' 'a' '3' '3' '4']
['1' '8' '9' '9' '4']]
You can't make them integers because numpy doesn't support mixed-typed data. 您不能将它们设置为整数,因为numpy不支持混合类型的数据。
NumPy is bad at handling arrays with mixed datatypes, so how about using pandas drop_duplicates
instead? NumPy很难处理具有混合数据类型的数组,那么如何使用pandas drop_duplicates
代替呢?
import pandas as pd
data = [[1,"a",3,3,4],[1,8,9,9,4], [1,"a",3,3,4]]
pd.DataFrame(data).drop_duplicates().values
# array([[1, 'a', 3, 3, 4],
# [1, 8, 9, 9, 4]], dtype=object)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.