Is there a way to remove non unique rows in data frame without using apply?

Question

I have a large data frame with over a million rows where I would like to drop any row that does not contain all unique values within the row itself.

    0   1   2   4   3
0   13  3   2   0   3 # Want to drop 
1   13  72  2   13  1 # Want to drop
2   13  3   2   8   5

Is there a faster way of achieving the same result as the code below?

df[df.apply(lambda x: x.is_unique, axis=1)]
#     0  1  2  4  3
# 2  13  3  2  8  5

Answer 1

Numpy is known to operate significantly faster than Pandas .

So attempt the following code:

nCol = df.shape[1]
df[np.apply_along_axis(lambda row: np.unique(row).size == nCol, 1, df.values)]

My comparison of execution time, using %timeit , indicates that my code is about 3 times faster than yours.

For bigger source DataFrame this difference can be greater. Check on your own and than pass the result in a comment.

By the way: I checked also solution proposed by enke , but it seems to be slower than your code.

Is there a way to remove non unique rows in data frame without using apply?

Question

1 answers

solution1
1 ACCPTED 2022-02-19 11:00:43

Is there a way to remove non unique rows in data frame without using apply?

Question

1 answers

solution1 1 ACCPTED 2022-02-19 11:00:43

solution1
1 ACCPTED 2022-02-19 11:00:43