I encountered a problem when I expected to find the first duplicated items in an array.
For example:
array = [a, b, c, b, b, a, c, a]
returns: [True, True, True, False, False, False, False, False]
I have tried to use the np.unique function but it either returns unique values or returns indices of unique values.
Is there any function that is able to do this?
You had a good approach with np.unique
. With return_index
the information you need is returned.
I augmented your example to show that this works generally independent of the positions of unique values.
array = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])
_, i = np.unique(array, return_index=True)
res = np.zeros_like(array, dtype=bool)
res[i] = True
print(res)
Out:
[ True True True False False False False True False]
If it's OK to use pandas
, there is a convenience function called duplicated()
which can be used on a Series.
Essentially, just wrap the numpy array in the Series constructor, call the (negated) function and return the boolean array as a numpy array.
Example:
a = np.array(['a', 'b', 'c', 'b', 'b', 'a', 'c', 'd', 'a'])
(~pd.Series(a).duplicated(keep='first')).to_numpy()
Output:
array([ True, True, True, False, False, False, False, True, False])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.