[英]Compare two columns with NaNs in Pandas and get differences
I have a following dataframe:我有以下 dataframe:
case c1 c2
1 x x
2 NaN y
3 x NaN
4 y x
5 NaN NaN
I would like to get a column "match" which will show which records with values in "c1" and "c2" are equal or different:我想得到一个“匹配”列,它将显示哪些记录在“c1”和“c2”中的值相等或不同:
case c1 c2 match
1 x x True
2 NaN y False
3 x NaN False
4 y x False
5 NaN NaN True
I tried the following based on another Stack Overflow question: Comparing two columns and keeping NaNs However, I can't get both cases 4 and 5 correct.我根据另一个 Stack Overflow 问题尝试了以下操作: 比较两列并保持 NaN但是,我不能同时得到案例 4 和案例 5 正确。
import pandas as pd
import numpy as np
df = pd.DataFrame({
'case': [1, 2, 3, 4, 5],
'c1': ['x', np.nan,'x','y', np.nan],
'c2': ['x', 'y',np.nan,'x', np.nan],
})
cond1 = df['c1'] == df['c2']
cond2 = (df['c1'].isnull()) == (df['c2'].isnull())
df['c3'] = np.select([cond1, cond2], [True, True], False)
df
Use eq
with isna
:将eq
与isna
一起使用:
df.c1.eq(df.c2)|df.iloc[:, 1:].isna().all(1)
#or
df.c1.eq(df.c2)|df.loc[:, ['c1','c2']].isna().all(1)
import pandas as pd
import numpy as np
df = pd.DataFrame({
'case': [1, 2, 3, 4, 5],
'c1': ['x', np.nan,'x','y', np.nan],
'c2': ['x', 'y',np.nan,'x', np.nan],
})
df['c3'] = df.apply(lambda row: True if str(row.c1) == str(row.c2) else False, axis=1)
print(df)
Output Output
case c1 c2 c3
0 1 x x True
1 2 NaN y False
2 3 x NaN False
3 4 y x False
4 5 NaN NaN True
Use nuquine
with fillna
将nuquine
与fillna
一起使用
import numpy as np
df.fillna(np.inf)[['c1','c2']].nunique(1) < 2
Or nunique
with option dropna=False
或带有选项nunique
dropna=False
的 nunique
df[['c1','c2']].nunique(1, dropna=False) < 2
Out[13]:
0 True
1 False
2 False
3 False
4 True
dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.