简体   繁体   中英

pandas number of items in one column per value in another column

I have two dataframes. say for example, frame 1 is the student info:

student_id course
1          a
2          b
3          c
4          a
5          f
6          f

frame 2 is each interaction the student has with a program

student_id day   number_of_clicks
1          4     60
1          5     34
1          7     87
2          3     33
2          4     29
2          8     213
2          9     46
3          2     103

I am trying to add the information from frame 2 to frame 1, ie. for each student I would like to know the number of different days they accessed the database on, and the sum of all the clicks on those days. eg:

student_id course no_days total_clicks
1          a      3       181
2          b      4       321
3          c      1       103
4          a      0       0
5          f      0       0
6          f      0       0

I've tried to do this with groupby, but I couldn't add the information back into frame 1, or figure out how to sum the number of clicks. any ideas?

First we aggregate your df2 to the desired information using GroupBy.agg . Then we merge that information into df1 :

agg = df2.groupby('student_id').agg(
    no_days=('day', 'size'),
    total_clicks=('number_of_clicks', 'sum')
)

df1 = df1.merge(agg, on='student_id', how='left').fillna(0)

   student_id course  no_days  total_clicks
0           1      a      3.0         181.0
1           2      b      4.0         321.0
2           3      c      1.0         103.0
3           4      a      0.0           0.0
4           5      f      0.0           0.0
5           6      f      0.0           0.0

Or if you like one-liners, here's the same method as above, but in one line of code and more in SQL kind of style:

df1.merge(
    df2.groupby('student_id').agg(
    no_days=('day', 'size'),
    total_clicks=('number_of_clicks', 'sum')
    ),
    on='student_id',
    how='left'
).fillna(0)

Use merge and fillna the null values then aggregate using groupby.agg as:

df = df1.merge(df2, how='left').fillna(0, downcast='infer')\
        .groupby(['student_id', 'course'], as_index=False)\
        .agg({'day':np.count_nonzero, 'number_of_clicks':np.sum}).reset_index()


print(df)
   student_id course  day  number_of_clicks
0           1      a    3               181
1           2      b    4               321
2           3      c    1               103
3           4      a    0                 0
4           5      f    0                 0
5           6      f    0                 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM