简体   繁体   中英

How to take row values from one pandas dataframe and use them as reference to get values from another dataframe

I have two dataframes. One contains contact information for constituents. The other was created to pair up constituents that might be part of the same household.

Sample:

data1 = {'Household_0':['1234567','2345678','3456789','4567890'],
        'Individual_0':['1111111','2222222','3333333','4444444'],
        'Individual_1':['5555555','6666666','7777777','']}
df1=pd.DataFrame(data1)

data2 = {'Constituent Id':['1234567','2345678','3456789','4567890',
                           '1111111','2222222','3333333','4444444',
                           '5555555','6666666','7777777'],
         'Display Name':['Clark Kent and Lois Lane','Bruce Banner and Betty Ross',
                         'Tony Stark and Pepper Pots','Steve Rogers','Clark Kent','Bruce Banner',
                         'Tony Stark','Steve Rogers','Lois Lane','Betty Ross','Pepper Pots']}
df2=pd.DataFrame(data2)

Resulting in:

df1
  Household_0 Individual_0 Individual_1
0     1234567      1111111      5555555
1     2345678      2222222      6666666
2     3456789      3333333      7777777
3     4567890      4444444     

df2
   Constituent Id                 Display Name
0         1234567     Clark Kent and Lois Lane
1         2345678  Bruce Banner and Betty Ross
2         3456789   Tony Stark and Pepper Pots
3         4567890                 Steve Rogers
4         1111111                   Clark Kent
5         2222222                 Bruce Banner
6         3333333                   Tony Stark
7         4444444                 Steve Rogers
8         5555555                    Lois Lane
9         6666666                   Betty Ross
10        7777777                  Pepper Pots

I would like to take df1, reference the Constituent Id out of df2, and create a new dataframe that has the names of the constituents instead of their IDs, so that we can ensure they are truly family/household members.

I believe I can do this by iterating, but that seems like the wrong approach. Is there a straightforward way to do this?

you can map each column from df1 with a series based on df2 once set_index Constituent Id and select the column Display Name. Use apply to repeat the operation on each column.

print (df1.apply(lambda x: x.map(df2.set_index('Constituent Id')['Display Name'])))
                   Household_0  Individual_0 Individual_1
0     Clark Kent and Lois Lane    Clark Kent    Lois Lane
1  Bruce Banner and Betty Ross  Bruce Banner   Betty Ross
2   Tony Stark and Pepper Pots    Tony Stark  Pepper Pots
3                 Steve Rogers  Steve Rogers          NaN

You can pipeline melt , merge and pivot_table .

df3 = (
    df1
    .reset_index()
    .melt('index')
    .merge(df2, left_on='value', right_on='Constituent Id')
    .pivot_table(values='Display Name', index='index', columns='variable', aggfunc='last')
)
print(df3)

outputs

variable                  Household_0  Individual_0 Individual_1
index                                                           
0            Clark Kent and Lois Lane    Clark Kent    Lois Lane
1         Bruce Banner and Betty Ross  Bruce Banner   Betty Ross
2          Tony Stark and Pepper Pots    Tony Stark  Pepper Pots
3                        Steve Rogers  Steve Rogers          NaN

You can also try using .applymap() to link the two together.

reference = df2.set_index('Constituent Id')['Display Name'].to_dict()
df1[df1.columns] = df1[df1.columns].applymap(reference.get)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM