简体   繁体   中英

Python how to find first duplicated items in an numpy array

I encountered a problem when I expected to find the first duplicated items in an array.
For example:

array = [a, b, c, b, b, a, c, a]

returns: [True, True, True, False, False, False, False, False]  

I have tried to use the np.unique function but it either returns unique values or returns indices of unique values.
Is there any function that is able to do this?

You had a good approach with np.unique . With return_index the information you need is returned.

I augmented your example to show that this works generally independent of the positions of unique values.

array = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

_, i = np.unique(array, return_index=True)
res = np.zeros_like(array, dtype=bool)
res[i] = True
print(res)

Out:

[ True  True  True False False False False  True False]

If it's OK to use pandas , there is a convenience function called duplicated() which can be used on a Series.

Essentially, just wrap the numpy array in the Series constructor, call the (negated) function and return the boolean array as a numpy array.

Example:

a = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])

(~pd.Series(a).duplicated(keep='first')).to_numpy()

Output:

array([ True, True, True, False, False, False, False, True, False])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM