I have two dfs, df1 and df2. I need to combine the dfs in a way that might require multiple left joins, but I have a feeling there's a better way to do this.
df1 is a table of locations and people (id numbers) associated with them, it looks like this.
location person1 person2 person3 ... personn
1 12 450 2 ... 90
2 23 218 4 ... 3
3 1000 274 937 ... 318
.... ... ... ... ... ...
1350 1 41 10 ... 101
df2 contains information about the people. It looks like this:
person year action
1 2020 a
2 2020 a
3 2020 b
4 2020 c
1000 2020 a
1 2019 c
2 2019 b
3 2019 a
4 2019 c
... ... ...
1000 2019 b
Ideally, I'd like the combined dataset to look like this:
location year action_a_count action_b_count action_c_count ... action_n_count
1 2020 1 0 0 ... ...
2 2020 0 1 1 ... ...
3 2020 1 0 0 ... ...
1350 2020 1 0 0 ... ...
1 2019 0 1 0 ... ...
2 2019 0 1 1 ... ...
3 2019 0 1 0 ... ...
1350 2019 0 0 1 ... ...
... ... ... ... ... ... ...
Right now my instinct is to do a series of left joins to get the actions for each person into df1, then figure out a way to count them.
You could restructure df1 to have 2 columns, location and person. That would simplify the subsequent operations.
df1_new = df1.melt(id_vars='location',
value_vars=df1.columns[1:],
value_name='person')
df1_new = df1_new.drop('variable', axis=1)
Now you can join df2 and df1_new
combined = df2.join(df1_new.set_index('person'), on='person', how='left')
Then create a pivot table
combined.pivot_table(index=['location', 'year'], columns='action', aggfunc='count')
After the pivot table is created, you can rename the columns however you'd like.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.