简体   繁体   English

熊猫,Python:每5分钟计算一次唯一的名称方法出现次数

[英]Pandas, python: Count unique name-method occurrences each 5 minutes

I've been given a pandas dataframe in the following format: 我得到了以下格式的pandas数据框:

             datetime                               name  mtd  code
0 2017-09-07 00:00:08                        profile/log  GET  300
1 2017-09-07 00:00:17                        profile/log  PUT  300
3 2017-09-07 00:00:19                             unknown PUT  200
4 2017-09-07 00:00:21                   extras/dashboard  GET  300
5 2017-09-07 00:00:23                       extras/stats  GET  300
6 2017-09-07 00:00:26                 extras/dashboard    GET  300
7 2017-09-07 00:00:29         extras/authz-profile/check  GET  200
8 2017-09-07 00:00:34                              about  PUT  300
9 2017-09-07 00:00:36                         extras/fav  GET  304
2 2017-09-07 00:00:44                       extras/store  GET  200

What I want to do is: 我想做的是:

  • to count number of occurrences for each name-mtd pair where response code starts with 3 for each 5 second interval starting from 2017-09-07 00:00:10 to 2017-09-07 00:00:40 计算2017-09-07 00:00:102017-09-07 00:00:40开始的每5秒间隔 中响应代码以3开头的每个名称2017-09-07 00:00:10对的2017-09-07 00:00:40

The desirable output is: 理想的输出是:

     datetime_start     pair                      3??_count
2017-09-07 00:00:10     profile/log - GET         2
2017-09-07 00:00:15     -                         0
2017-09-07 00:00:20     extras/dashboard - GET    1
2017-09-07 00:00:20     extras/stats - GET        1
2017-09-07 00:00:25     extras/dashboard - GET    1
2017-09-07 00:00:30     about - PUT               1
2017-09-07 00:00:35     extras/fav - GET          1
2017-09-07 00:00:40     -                         0   

How am I to do that with pandas ? 我该如何处理熊猫

I have written a piece of code that creates time periods as shown in the desirable output table, but don't know how to count a 3?? 我写了一段代码,可以创建desirable output表中所示的时间段,但是不知道如何计算3? name-mtd pair for each 5-second period. 每5秒钟内有一个名称-mtd对。 I would highly appreciate any help! 我将不胜感激任何帮助!

data['datetime_start'] = pd.date_range(start="2017-09-07 00:00:10", end="2017-09-07 00:00:40", freq="5S")

create the start_date column 创建start_date列

df['start_date']= df[' datetime'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,dt.minute ,5*(dt.second//5)))

then you can aggregate 那么你可以汇总

df.groupby(['start_date','name','mtd']).size()

here is one way to approach this 这是解决这个问题的一种方法

create a column that combines name-mtd as below 创建一个合并name-mtd的列,如下所示

df['pair'] = df['name']+' - '+df['mtd']

then use PeriodIndex to specify the period to group the column datatime on as shown below 然后使用PeriodIndex指定将列数据时间分组的时间,如下所示

res = df.groupby([pd.PeriodIndex(df.datetime.dt.round('5s'),freq='5S'),
                'pair'])['pair'].count()

The output will be 输出将是

datetime             pair                            
2017-09-07 00:00:10  profile/log - GET                   1
2017-09-07 00:00:15  profile/log - PUT                   1
2017-09-07 00:00:20  extras/dashboard - GET              1
                     unknown - PUT                       1
2017-09-07 00:00:25  extras/dashboard - GET              1
                     extras/stats - GET                  1
2017-09-07 00:00:30  extras/authz-profile/check - GET    1
2017-09-07 00:00:35  about - PUT                         1
                     extras/fav - GET                    1
2017-09-07 00:00:45  extras/store - GET                  1
Name: pair, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM