[英]Python pandas lambda compare two columns in dataframe
我在a,b中有兩列貨幣;
兩列中都有一些單元格為空。
我希望創建第三列 c 來識別以下邏輯:
if a == b then display 'same'
elif a == None then display 'a missing'
elif b == None then display 'b missing'
elif a == None and b == None then display 'all missing'
else 'diff currency'.
這些是我下面的代碼。 它只是返回“相同”或“差異貨幣”,兩者之間沒有任何關系。
請在這里闡明我的語法或邏輯缺陷。 太感謝了!
import pandas as pd
# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())
b = list(('USD CAD RMB HKD nan nan USD EUR').split())
# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])
df = df.replace('nan', '')
df['c'] = df.apply(lambda x: 'Same' if x['a'] == x['b']
else ('a missing' if x['a']==None
else ('b missing' if x['b']==None
else ('a & b missing' if x['a']==None and x['b']==None
else 'diff currency'))), axis=1)
如果您學習如何使用矢量化函數,那就更好了。 它們都是慣用的 pandas 並且速度極快。 使用np.select
:
a = df["a"]
b = df["b"]
df["c"] = np.select(
[a.isna() & b.isna(), a.isna(), b.isna(), np.isclose(a, b)],
["all missing", "a missing", "b missing", "same"],
"diff currency",
)
您可以為此使用np.select
。
import pandas as pd
import numpy as np
# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())
b = list(('USD CAD RMB HKD nan nan USD EUR').split())
# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])
# change string `nan` into actual NaN values
df = df.replace('nan', np.nan)
condlist = [df.a == df.b, df.isna().all(axis=1), df.a.isna(), df.b.isna()]
choicelist = ['same', 'all missing', 'a missing', 'b missing']
df['c'] = np.select(condlist,choicelist,default='diff currency')
print(df)
a b c
0 USD USD same
1 USD CAD diff currency
2 CAD RMB diff currency
3 NaN HKD a missing
4 JMD NaN b missing
5 NaN NaN all missing
6 HKD USD diff currency
7 CAD EUR diff currency
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.