简体   繁体   English

将一个 Panda Dataframe 中的列值替换为具有条件的另一个 Panda Dataframe 中的列

[英]Replace column value in one Panda Dataframe with column in another Panda Dataframe with conditions

I have the following 3 Panda Dataframe.我有以下 3 个熊猫数据框。 I want to replace company and division columns the ID from their respective company and division dataframe.我想将公司和部门列替换为各自公司和部门数据框中的 ID。

pd_staff:

id    name      company         division
P001  John      Sunrise         Headquarter
P002  Jane      Falcon Digital  Research & Development
P003  Joe       Ashford         Finance
P004  Adam      Falcon Digital  Sales
P004  Barbara   Sunrise         Human Resource


pd_company:

id  name
1   Sunrise
2   Falcon Digital
3   Ashford


pd_division:

id  name
1   Headquarter
2   Research & Development
3   Finance
4   Sales
5   Human Resource

This is the end result that I am trying to produce这是我试图产生的最终结果

id    name      company   division
P001  John      1         1
P002  Jane      2         2
P003  Joe       3         3
P004  Adam      2         4
P004  Barbara   1         5

I have tried to combine Staff and Company using this code我尝试使用此代码将员工和公司结合起来

pd_staff.loc[pd_staff['company'].isin(pd_company['name']), 'company'] = pd_company.loc[pd_company['name'].isin(pd_staff['company']), 'id']

which produces产生

id    name      company   
P001  John      1.0        
P002  Jane      NaN         
P003  Joe       NaN         
P004  Adam      NaN       
P004  Barbara   NaN     

You can do:你可以做:

pd_staff['company'] = pd_staff['company'].map(pd_company.set_index('name')['id'])
pd_staff['division'] = pd_staff['division'].map(pd_division.set_index('name')['id'])

print(pd_staff):打印(pd_staff):

     id     name  company  division
0  P001     John        1         1
1  P002     Jane        2         2
2  P003      Joe        3         3
3  P004     Adam        2         4
4  P004  Barbara        1         5

This will achieve the desired results这将达到预期的结果

df_merge = df.merge(df2, how = 'inner', right_on = 'name', left_on = 'company', suffixes=('', '_y'))
df_merge = df_merge.merge(df3, how = 'inner', left_on = 'division', right_on = 'name', suffixes=('', '_z'))
df_merge = df_merge[['id', 'name', 'id_y', 'id_z']]
df_merge.columns = ['id', 'name', 'company', 'division']
df_merge.sort_values('id')

first, lets modify df company and df division a little bit首先,让我们稍微修改一下 df 公司和 df 部门

df2.rename(columns={'name':'company'},inplace=True)
df3.rename(columns={'name':'division'},inplace=True)

Then然后

df1=df1.merge(df2,on='company',how='left').merge(df3,on='division',how='left')
df1=df1[['id_x','name','id_y','id']]
df1.rename(columns={'id_x':'id','id_y':'company','id':'division'},inplace=True)

Use apply, you can have a function thar will replace the values.使用 apply,你可以有一个函数 thar 将替换这些值。 from the second excel you will pass the field to look up to and what's to replace in this.从第二个 excel 中,您将传递要查找的字段以及要替换的内容。 Here I am replacing Sunrise by 1 because it is in the second excel.在这里,我将 Sunrise 替换为 1,因为它在第二个 excel 中。

import pandas as pd

df = pd.read_excel('teste.xlsx')
df2 = pd.read_excel('ids.xlsx')

def altera(df33, field='Sunrise', new_field='1'): # for showing pourposes I left default values but they are to pass from the second excel
    return df33.replace(field, new_field)


df.loc[:, 'company'] = df['company'].apply(altera)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM