Python how to find first duplicated items in an numpy array

Question

I encountered a problem when I expected to find the first duplicated items in an array.
For example:

array = [a, b, c, b, b, a, c, a]

returns: [True, True, True, False, False, False, False, False]

I have tried to use the np.unique function but it either returns unique values or returns indices of unique values.
Is there any function that is able to do this?

Answer 1

You had a good approach with np.unique . With return_index the information you need is returned.

I augmented your example to show that this works generally independent of the positions of unique values.

array = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

_, i = np.unique(array, return_index=True)
res = np.zeros_like(array, dtype=bool)
res[i] = True
print(res)

Out:

[ True  True  True False False False False  True False]

Answer 2

If it's OK to use pandas , there is a convenience function called duplicated() which can be used on a Series.

Essentially, just wrap the numpy array in the Series constructor, call the (negated) function and return the boolean array as a numpy array.

Example:

a = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

(~pd.Series(a).duplicated(keep='first')).to_numpy()

Output:

array([ True, True, True, False, False, False, False, True, False])

Python how to find first duplicated items in an numpy array

Question

2 answers

solution1
2 2020-11-14 14:39:23

solution2
1 2020-11-14 19:31:42

Python how to find first duplicated items in an numpy array

Question

2 answers

solution1 2 2020-11-14 14:39:23

solution2 1 2020-11-14 19:31:42

solution1
2 2020-11-14 14:39:23

solution2
1 2020-11-14 19:31:42