I have two dataframes. say for example, frame 1 is the student info:
student_id course
1 a
2 b
3 c
4 a
5 f
6 f
frame 2 is each interaction the student has with a program
student_id day number_of_clicks
1 4 60
1 5 34
1 7 87
2 3 33
2 4 29
2 8 213
2 9 46
3 2 103
I am trying to add the information from frame 2 to frame 1, ie. for each student I would like to know the number of different days they accessed the database on, and the sum of all the clicks on those days. eg:
student_id course no_days total_clicks
1 a 3 181
2 b 4 321
3 c 1 103
4 a 0 0
5 f 0 0
6 f 0 0
I've tried to do this with groupby, but I couldn't add the information back into frame 1, or figure out how to sum the number of clicks. any ideas?
First we aggregate your df2
to the desired information using GroupBy.agg
. Then we merge
that information into df1
:
agg = df2.groupby('student_id').agg(
no_days=('day', 'size'),
total_clicks=('number_of_clicks', 'sum')
)
df1 = df1.merge(agg, on='student_id', how='left').fillna(0)
student_id course no_days total_clicks
0 1 a 3.0 181.0
1 2 b 4.0 321.0
2 3 c 1.0 103.0
3 4 a 0.0 0.0
4 5 f 0.0 0.0
5 6 f 0.0 0.0
Or if you like one-liners, here's the same method as above, but in one line of code and more in SQL
kind of style:
df1.merge(
df2.groupby('student_id').agg(
no_days=('day', 'size'),
total_clicks=('number_of_clicks', 'sum')
),
on='student_id',
how='left'
).fillna(0)
Use merge
and fillna
the null values then aggregate using groupby.agg
as:
df = df1.merge(df2, how='left').fillna(0, downcast='infer')\
.groupby(['student_id', 'course'], as_index=False)\
.agg({'day':np.count_nonzero, 'number_of_clicks':np.sum}).reset_index()
print(df)
student_id course day number_of_clicks
0 1 a 3 181
1 2 b 4 321
2 3 c 1 103
3 4 a 0 0
4 5 f 0 0
5 6 f 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.