简体   繁体   中英

pandas - groupby multiple values?

i have a dataframe that contains cell phone minutes usage logged by date of call and duration.

It looks like this (30 row sample):

          id  user_id  call_date  duration
0    1000_93     1000 2018-12-27      8.52
1   1000_145     1000 2018-12-27     13.66
2   1000_247     1000 2018-12-27     14.48
3   1000_309     1000 2018-12-28      5.76
4   1000_380     1000 2018-12-30      4.22
5   1000_388     1000 2018-12-31      2.20
6   1000_510     1000 2018-12-27      5.75
7   1000_521     1000 2018-12-28     14.18
8   1000_530     1000 2018-12-28      5.77
9   1000_544     1000 2018-12-26      4.40
10  1000_693     1000 2018-12-31      4.31
11  1000_705     1000 2018-12-31     12.78
12  1000_735     1000 2018-12-29      1.70
13  1000_778     1000 2018-12-28      3.29
14  1000_826     1000 2018-12-26      9.96
15  1000_842     1000 2018-12-27      5.85
16    1001_0     1001 2018-09-06     10.06
17    1001_1     1001 2018-10-12      1.00
18    1001_2     1001 2018-10-17     15.83
19    1001_4     1001 2018-12-05      0.00
20    1001_5     1001 2018-12-13      6.27
21    1001_6     1001 2018-12-04      7.19
22    1001_8     1001 2018-11-17      2.45
23    1001_9     1001 2018-11-19      2.40
24   1001_11     1001 2018-11-09      1.00
25   1001_13     1001 2018-12-24      0.00
26   1001_19     1001 2018-11-15     30.00
27   1001_20     1001 2018-09-21      5.75
28   1001_23     1001 2018-10-27      0.98
29   1001_26     1001 2018-10-28      5.90
30   1001_29     1001 2018-09-30     14.78

I want to group by user_id AND call_date with the ultimate goal of calculating the number of minutes used per month over the course of the year, per user.

I thought i could accomplish this by using:

calls.groupby(['user_id','call_date'])['duration'].sum()

but the results aren't what i expected:

  user_id  call_date 
1000     2018-12-26    14.36
         2018-12-27    48.26
         2018-12-28    29.00
         2018-12-29     1.70
         2018-12-30     4.22
         2018-12-31    19.29
1001     2018-08-14    13.86
         2018-08-16    23.46
         2018-08-17     8.11
         2018-08-18     1.74
         2018-08-19    10.73
         2018-08-20     7.32
         2018-08-21     0.00
         2018-08-23     8.50
         2018-08-24     8.63
         2018-08-25    35.39
         2018-08-27    10.57
         2018-08-28    19.91
         2018-08-29     0.54
         2018-08-31    22.38
         2018-09-01     7.53
         2018-09-02    10.27
         2018-09-03    30.66
         2018-09-04     0.00
         2018-09-05     9.09
         2018-09-06    10.06

i'd hoped that it would be grouped like user_id 1000, all calls for jan with duration summed, all calls for feb with duration summed, etc.

i am really new to python and programming in general and am not sure what my next step should be to get these grouped by user_id and month of the year?

Thanks in advance for any insight you can offer.

Regards,

Jared

Something is not quite right in your setup. First of all, both of your tables are the same, so I am not sure if this is a cut-and-paste error or something else. Here is what I do with your data. Load it up like so, note we explicitly convert call_date to Datetime`

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(
"""
          id  user_id  call_date  duration
0    1000_93     1000 2018-12-27      8.52
1   1000_145     1000 2018-12-27     13.66
2   1000_247     1000 2018-12-27     14.48
3   1000_309     1000 2018-12-28      5.76
4   1000_380     1000 2018-12-30      4.22
5   1000_388     1000 2018-12-31      2.20
6   1000_510     1000 2018-12-27      5.75
7   1000_521     1000 2018-12-28     14.18
8   1000_530     1000 2018-12-28      5.77
9   1000_544     1000 2018-12-26      4.40
10  1000_693     1000 2018-12-31      4.31
11  1000_705     1000 2018-12-31     12.78
12  1000_735     1000 2018-12-29      1.70
13  1000_778     1000 2018-12-28      3.29
14  1000_826     1000 2018-12-26      9.96
15  1000_842     1000 2018-12-27      5.85
16    1001_0     1001 2018-09-06     10.06
17    1001_1     1001 2018-10-12      1.00
18    1001_2     1001 2018-10-17     15.83
19    1001_4     1001 2018-12-05      0.00
20    1001_5     1001 2018-12-13      6.27
21    1001_6     1001 2018-12-04      7.19
22    1001_8     1001 2018-11-17      2.45
23    1001_9     1001 2018-11-19      2.40
24   1001_11     1001 2018-11-09      1.00
25   1001_13     1001 2018-12-24      0.00
26   1001_19     1001 2018-11-15     30.00
27   1001_20     1001 2018-09-21      5.75
28   1001_23     1001 2018-10-27      0.98
29   1001_26     1001 2018-10-28      5.90
30   1001_29     1001 2018-09-30     14.78
"""), delim_whitespace = True, index_col=0)
df['call_date'] = pd.to_datetime(df['call_date'])

Then using

df.groupby(['user_id','call_date'])['duration'].sum()

does the expected grouping by user and by each date:

user_id  call_date 
1000     2018-12-26    14.36
         2018-12-27    48.26
         2018-12-28    29.00
         2018-12-29     1.70
         2018-12-30     4.22
         2018-12-31    19.29
1001     2018-09-06    10.06
         2018-09-21     5.75
         2018-09-30    14.78
         2018-10-12     1.00
         2018-10-17    15.83
         2018-10-27     0.98
         2018-10-28     5.90
         2018-11-09     1.00
         2018-11-15    30.00
         2018-11-17     2.45
         2018-11-19     2.40
         2018-12-04     7.19
         2018-12-05     0.00
         2018-12-13     6.27
         2018-12-24     0.00

If you want to group by month as you seem to suggest you can use the Grouper functionality:

df.groupby(['user_id',pd.Grouper(key='call_date', freq='1M')])['duration'].sum()

which produces

user_id  call_date 
1000     2018-12-31    116.83
1001     2018-09-30     30.59
         2018-10-31     23.71
         2018-11-30     35.85
         2018-12-31     13.46

Let me know if you are getting different results from following these steps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM