简体   繁体   中英

How to split the dataframe and group sum?

Each IP address has 6121 lines of data. the datetime is repeated for various ip addresses. i want to group the datetime by each month.

what i tried is

df.groupby(['ip_addr'], [pd.TimeGrouper('D')]).sum()

but the result that comes out is:

datetime no_of_queriers for all ip_addr combined.

The columns i want to get is

datetime(in month) no_of_queriers ip_addr.

Please help me with this!

  /// datetime no_of_queriers ip_addr 0 2014-02-16 00:00:00 0 1.204.33.193 1 2014-02-16 01:00:00 0 1.204.33.193 2 2014-02-16 02:00:00 0 1.204.33.193 3 2014-02-16 03:00:00 0 1.204.33.193 4 2014-02-16 04:00:00 0 1.204.33.193 5 2014-02-16 05:00:00 0 1.204.33.193 6 2014-02-16 06:00:00 0 1.204.33.193 7 2014-02-16 07:00:00 0 1.204.33.193 8 2014-02-16 08:00:00 0 1.204.33.193 9 2014-02-16 09:00:00 0 1.204.33.193 10 2014-02-16 10:00:00 0 1.204.33.193 11 2014-02-16 11:00:00 0 1.204.33.193 12 2014-02-16 12:00:00 0 1.204.33.193 13 2014-02-16 13:00:00 0 1.204.33.193 14 2014-02-16 14:00:00 0 1.204.33.193 15 2014-02-16 15:00:00 0 1.204.33.193 16 2014-02-16 16:00:00 0 1.204.33.193 17 2014-02-16 17:00:00 0 1.204.33.193 18 2014-02-16 18:00:00 0 1.204.33.193 19 2014-02-16 19:00:00 0 1.204.33.193 20 2014-02-16 20:00:00 0 1.204.33.193 21 2014-02-16 21:00:00 0 1.204.33.193 22 2014-02-16 22:00:00 0 1.204.33.193 23 2014-02-16 23:00:00 0 1.204.33.193 24 2014-02-17 00:00:00 0 1.204.33.193 25 2014-02-17 01:00:00 0 1.204.33.193 26 2014-02-17 02:00:00 0 1.204.33.193 27 2014-02-17 03:00:00 0 1.204.33.193 28 2014-02-17 04:00:00 0 1.204.33.193 29 2014-02-17 05:00:00 0 1.204.33.193 ... ... ... ... 30575 2014-10-27 19:00:00 0 1.204.33.85 30576 2014-10-27 20:00:00 0 1.204.33.85 30577 2014-10-27 21:00:00 0 1.204.33.85 30578 2014-10-27 22:00:00 0 1.204.33.85 30579 2014-10-27 23:00:00 0 1.204.33.85 30580 2014-10-28 00:00:00 0 1.204.33.85 30581 2014-10-28 01:00:00 0 1.204.33.85 30582 2014-10-28 02:00:00 0 1.204.33.85 30583 2014-10-28 03:00:00 0 1.204.33.85 30584 2014-10-28 04:00:00 0 1.204.33.85 30585 2014-10-28 05:00:00 0 1.204.33.85 30586 2014-10-28 06:00:00 0 1.204.33.85 30587 2014-10-28 07:00:00 0 1.204.33.85 30588 2014-10-28 08:00:00 0 1.204.33.85 30589 2014-10-28 09:00:00 0 1.204.33.85 30590 2014-10-28 10:00:00 0 1.204.33.85 30591 2014-10-28 11:00:00 0 1.204.33.85 30592 2014-10-28 12:00:00 0 1.204.33.85 30593 2014-10-28 13:00:00 0 1.204.33.85 30594 2014-10-28 14:00:00 0 1.204.33.85 30595 2014-10-28 15:00:00 0 1.204.33.85 30596 2014-10-28 16:00:00 0 1.204.33.85 30597 2014-10-28 17:00:00 0 1.204.33.85 30598 2014-10-28 18:00:00 0 1.204.33.85 30599 2014-10-28 19:00:00 0 1.204.33.85 30600 2014-10-28 20:00:00 0 1.204.33.85 30601 2014-10-28 21:00:00 0 1.204.33.85 30602 2014-10-28 22:00:00 0 1.204.33.85 30603 2014-10-28 23:00:00 0 1.204.33.85 30604 2014-10-29 00:00:00 0 1.204.33.85 

这是你想要的:

df.groupby(['ip_addr',pd.Grouper(key='datetime',freq='M')]).count()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM