简体   繁体   English

Python pandas lambda 比较 Z6A8064B5DF479455500553C47DCZ 中的两列

[英]Python pandas lambda compare two columns in dataframe

I have two columns of currencies in a, b;我在a,b中有两列货币;

There are some cells that are empty in both columns.两列中都有一些单元格为空。

I am hoping to create a third column c to identified the following logic:我希望创建第三列 c 来识别以下逻辑:

if a == b then display 'same'
elif a == None then display 'a missing'
elif b == None then display 'b missing'
elif a == None and b == None then display 'all missing'
else 'diff currency'.

These are my codes below.这些是我下面的代码。 It just return 'same' or 'diff currency', nothing in between.它只是返回“相同”或“差异货币”,两者之间没有任何关系。

Please shed some lights on my syntax or logic flaws here.请在这里阐明我的语法或逻辑缺陷。 Thank you so much!太感谢了!

import pandas as pd

# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())

b = list(('USD CAD RMB HKD nan nan USD EUR').split())


# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])

df = df.replace('nan', '')


df['c'] = df.apply(lambda x: 'Same' if x['a'] == x['b'] 
                   else ('a missing' if x['a']==None
                         else ('b missing' if x['b']==None 
                         else ('a & b missing' if x['a']==None and x['b']==None
                         else 'diff currency'))), axis=1)

It's better if you learn how to use the vectorized functions.如果您学习如何使用矢量化函数,那就更好了。 They are both idiomatic pandas and extremely fast.它们都是惯用的 pandas 并且速度极快。 Use np.select :使用np.select

a = df["a"]
b = df["b"]
df["c"] = np.select(
    [a.isna() & b.isna(), a.isna(), b.isna(), np.isclose(a, b)],
    ["all missing", "a missing", "b missing", "same"],
    "diff currency",
)

You can use np.select for this.您可以为此使用np.select

import pandas as pd
import numpy as np

# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())

b = list(('USD CAD RMB HKD nan nan USD EUR').split())

# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])

# change string `nan` into actual NaN values
df = df.replace('nan', np.nan)

condlist = [df.a == df.b, df.isna().all(axis=1), df.a.isna(), df.b.isna()]
choicelist = ['same', 'all missing', 'a missing', 'b missing']

df['c'] = np.select(condlist,choicelist,default='diff currency')
print(df)

     a    b              c
0  USD  USD           same
1  USD  CAD  diff currency
2  CAD  RMB  diff currency
3  NaN  HKD      a missing
4  JMD  NaN      b missing
5  NaN  NaN    all missing
6  HKD  USD  diff currency
7  CAD  EUR  diff currency

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM