简体   繁体   中英

Better way to combine tables than multiple joins?

I have two dfs, df1 and df2. I need to combine the dfs in a way that might require multiple left joins, but I have a feeling there's a better way to do this.

df1 is a table of locations and people (id numbers) associated with them, it looks like this.

location person1 person2 person3 ... personn
1        12      450     2       ... 90
2        23      218     4       ... 3
3        1000    274     937     ... 318
....     ...     ...     ...     ... ...
1350     1       41      10      ... 101

df2 contains information about the people. It looks like this:

person year action
1      2020 a
2      2020 a
3      2020 b
4      2020 c
1000   2020 a
1      2019 c
2      2019 b
3      2019 a
4      2019 c
...    ...  ...
1000   2019 b

Ideally, I'd like the combined dataset to look like this:

location year action_a_count action_b_count action_c_count ... action_n_count
1        2020 1              0              0              ... ...
2        2020 0              1              1              ... ...
3        2020 1              0              0              ... ...
1350     2020 1              0              0              ... ...
1        2019 0              1              0              ... ...
2        2019 0              1              1              ... ...
3        2019 0              1              0              ... ...
1350     2019 0              0              1              ... ...
...      ...  ...            ...            ...            ... ...

Right now my instinct is to do a series of left joins to get the actions for each person into df1, then figure out a way to count them.

You could restructure df1 to have 2 columns, location and person. That would simplify the subsequent operations.

df1_new = df1.melt(id_vars='location', 
                   value_vars=df1.columns[1:], 
                   value_name='person')

df1_new = df1_new.drop('variable', axis=1)

Now you can join df2 and df1_new

combined = df2.join(df1_new.set_index('person'), on='person', how='left')

Then create a pivot table

combined.pivot_table(index=['location', 'year'], columns='action',  aggfunc='count')

After the pivot table is created, you can rename the columns however you'd like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM