如何删除重复，包括numpy数组中的字符串？

Question

I want to remove duplicates from numpy array. 我想从numpy数组中删除重复项。

>data = np.array([[1,"a",3,3,4],
                  [1,8,9,9,4],
                  [1,"a",3,3,4]])

>new_array = [tuple(row) for row in data]
>uniques = np.unique(new_array)
>uniques

output: array(['1', '3', '4', '8', '9', 'a'], dtype='<U1')

But what I want is 但是我想要的是

output: np.array([[1,"a",3,3,4],[1,8,9,9,4]])

How can I do this?Thanks. 我该怎么办？谢谢。

Answer 1

Use vstack : 使用vstack ：

print(np.vstack({tuple(row) for row in data}))

Output: 输出：

[['1' 'a' '3' '3' '4']
 ['1' '8' '9' '9' '4']]

You can't make them integers because numpy doesn't support mixed-typed data. 您不能将它们设置为整数，因为numpy不支持混合类型的数据。

Answer 2

NumPy is bad at handling arrays with mixed datatypes, so how about using pandas drop_duplicates instead? NumPy很难处理具有混合数据类型的数组，那么如何使用pandas drop_duplicates代替呢？

import pandas as pd

data = [[1,"a",3,3,4],[1,8,9,9,4], [1,"a",3,3,4]]
pd.DataFrame(data).drop_duplicates().values

# array([[1, 'a', 3, 3, 4],
#        [1, 8, 9, 9, 4]], dtype=object)

如何删除重复，包括numpy数组中的字符串？

问题描述

2 个解决方案

解决方案1
0 2019-02-13 10:24:37

解决方案2
0 已采纳 2019-02-13 10:26:17

如何删除重复，包括numpy数组中的字符串？

问题描述

2 个解决方案

解决方案1 0 2019-02-13 10:24:37

解决方案2 0 已采纳 2019-02-13 10:26:17

解决方案1
0 2019-02-13 10:24:37

解决方案2
0 已采纳 2019-02-13 10:26:17