I want to perform a row by row comparison over multiple columns. I want a single series, indicating if all entries in a row (over several columns) are the same as the previous row.
Lets say I have the following dataframe
import pandas as pd
df = pd.DataFrame({'A' : [1, 1, 1, 2, 2],
'B' : [2, 2, 3, 3, 3],
'C' : [1, 1, 1, 2, 2]})
I can compare all the rows, of all the columns
>>> df.diff().eq(0)
A B C
0 False False False
1 True True True
2 True False True
3 False True False
4 True True True
This gives a dataframe comparing each series individually. What I want is the comparison of all columns in one series.
I can achieve this by looping
compare_all = df.diff().eq(0)
compare_tot = compare_all[compare_all.columns[0]]
for c in compare_all.columns[1:]:
compare_tot = compare_tot & compare_all[c]
This gives
>>> compare_tot
0 False
1 True
2 False
3 False
4 True
dtype: bool
as expected.
Is it possible to achieve this in with a one-liner, that is without the loop?
>>> (df == df.shift()).all(axis=1)
0 False
1 True
2 False
3 False
4 True
dtype: bool
You need all
In [1306]: df.diff().eq(0).all(1)
Out[1306]:
0 False
1 True
2 False
3 False
4 True
dtype: bool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.