简体   繁体   English

在 Pandas 数据框列上应用函数时出现问题

[英]Problem when applying a function on a pandas dataframe column

I have these two dataframes :我有这两个数据框:

df = pd.DataFrame({'Points' : ['A','B','C','D','E'],'ColY' : [1,2,3,4,5]})
df
    Points  ColY
0       A      1
1       B      2
2       C      3
3       D      4
4       E      5

df2 = pd.DataFrame({'Points' : ['A','D'],'ColX' : [2,9]})
df2
    Points  ColX
0       A      2
1       D      9

And these two functions :这两个功能:

# equivalent of the Excel vlookup function applied to a dataframe
def vlookup(df,ref,col_ref,col_goal):
    return pd.DataFrame(df[df.apply(lambda x: ref == x[col_ref],axis=1)][col_goal]).iloc[0,0]

# if x is in column Points of df2, return what is in column ColX in the same row
def update_if_belong_to_df2(x):
    if x in df2['Points']:
        return vlookup(df2,x,'Points','ColX')
    return x

I would like to apply the function update_if_belong_to_df2 to the column ColY of df.我想将函数 update_if_belong_to_df2 应用于 df 的列 ColY。 I tried the following but it doesn't work :我尝试了以下但不起作用:

df['ColY'] = df['ColY'].apply(lambda x : update_if_belong_to_df2(x))

I would like to get :我想得到:

df
    Points  ColY
0       A      2
1       B      2
2       C      3
3       D      9
4       E      5

Could you please help me to understand why ?你能帮我理解为什么吗? Thanks谢谢

I will do merge我会做merge

df=df.merge(df2,how='left')
df.ColX=df.ColX.fillna(df.ColY)
df
  Points  ColY  ColX
0      A     1   2.0
1      B     2   2.0
2      C     3   3.0
3      D     4   9.0
4      E     5   5.0

IIUC, your problem is easier with map and fillna : IIUC,你的问题用mapfillna更容易:

df['ColY'] = (df['Points'].map(df2.set_index('Points')['ColX'])
                   .fillna(df['ColY'])
              )

Output:输出:

  Points  ColY
0      A   2.0
1      B   2.0
2      C   3.0
3      D   9.0
4      E   5.0

Use pandas update instead:改用熊猫update

df = pd.DataFrame({'Points' : ['A','B','C','D','E'],'ColY' : [1,2,3,4,5]})
df2 = pd.DataFrame({'Points' : ['A','D'],'ColX' : [2,9]})

df = df.set_index('Points')
df.update(df2.set_index('Points').rename(columns={'ColX': 'ColY'}))

df.reset_index()

  Points  ColY
0      A   2.0
1      B   2.0
2      C   3.0
3      D   9.0
4      E   5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM