[英]Python pandas lambda compare two columns in dataframe
我在a,b中有两列货币;
两列中都有一些单元格为空。
我希望创建第三列 c 来识别以下逻辑:
if a == b then display 'same'
elif a == None then display 'a missing'
elif b == None then display 'b missing'
elif a == None and b == None then display 'all missing'
else 'diff currency'.
这些是我下面的代码。 它只是返回“相同”或“差异货币”,两者之间没有任何关系。
请在这里阐明我的语法或逻辑缺陷。 太感谢了!
import pandas as pd
# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())
b = list(('USD CAD RMB HKD nan nan USD EUR').split())
# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])
df = df.replace('nan', '')
df['c'] = df.apply(lambda x: 'Same' if x['a'] == x['b']
else ('a missing' if x['a']==None
else ('b missing' if x['b']==None
else ('a & b missing' if x['a']==None and x['b']==None
else 'diff currency'))), axis=1)
如果您学习如何使用矢量化函数,那就更好了。 它们都是惯用的 pandas 并且速度极快。 使用np.select
:
a = df["a"]
b = df["b"]
df["c"] = np.select(
[a.isna() & b.isna(), a.isna(), b.isna(), np.isclose(a, b)],
["all missing", "a missing", "b missing", "same"],
"diff currency",
)
您可以为此使用np.select
。
import pandas as pd
import numpy as np
# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())
b = list(('USD CAD RMB HKD nan nan USD EUR').split())
# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])
# change string `nan` into actual NaN values
df = df.replace('nan', np.nan)
condlist = [df.a == df.b, df.isna().all(axis=1), df.a.isna(), df.b.isna()]
choicelist = ['same', 'all missing', 'a missing', 'b missing']
df['c'] = np.select(condlist,choicelist,default='diff currency')
print(df)
a b c
0 USD USD same
1 USD CAD diff currency
2 CAD RMB diff currency
3 NaN HKD a missing
4 JMD NaN b missing
5 NaN NaN all missing
6 HKD USD diff currency
7 CAD EUR diff currency
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.