[英]Pandas, python: Count unique name-method occurrences each 5 minutes
I've been given a pandas dataframe in the following format: 我得到了以下格式的pandas数据框:
datetime name mtd code
0 2017-09-07 00:00:08 profile/log GET 300
1 2017-09-07 00:00:17 profile/log PUT 300
3 2017-09-07 00:00:19 unknown PUT 200
4 2017-09-07 00:00:21 extras/dashboard GET 300
5 2017-09-07 00:00:23 extras/stats GET 300
6 2017-09-07 00:00:26 extras/dashboard GET 300
7 2017-09-07 00:00:29 extras/authz-profile/check GET 200
8 2017-09-07 00:00:34 about PUT 300
9 2017-09-07 00:00:36 extras/fav GET 304
2 2017-09-07 00:00:44 extras/store GET 200
What I want to do is: 我想做的是:
2017-09-07 00:00:10
to 2017-09-07 00:00:40
计算从2017-09-07 00:00:10
到2017-09-07 00:00:40
开始的每5秒间隔 中响应代码以3开头的每个名称2017-09-07 00:00:10
对的2017-09-07 00:00:40
The desirable output is: 理想的输出是:
datetime_start pair 3??_count
2017-09-07 00:00:10 profile/log - GET 2
2017-09-07 00:00:15 - 0
2017-09-07 00:00:20 extras/dashboard - GET 1
2017-09-07 00:00:20 extras/stats - GET 1
2017-09-07 00:00:25 extras/dashboard - GET 1
2017-09-07 00:00:30 about - PUT 1
2017-09-07 00:00:35 extras/fav - GET 1
2017-09-07 00:00:40 - 0
How am I to do that with pandas ? 我该如何处理熊猫 ?
I have written a piece of code that creates time periods as shown in the desirable output
table, but don't know how to count a 3?? 我写了一段代码,可以创建desirable output
表中所示的时间段,但是不知道如何计算3? name-mtd pair for each 5-second period. 每5秒钟内有一个名称-mtd对。 I would highly appreciate any help! 我将不胜感激任何帮助!
data['datetime_start'] = pd.date_range(start="2017-09-07 00:00:10", end="2017-09-07 00:00:40", freq="5S")
create the start_date column 创建start_date列
df['start_date']= df[' datetime'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,dt.minute ,5*(dt.second//5)))
then you can aggregate 那么你可以汇总
df.groupby(['start_date','name','mtd']).size()
here is one way to approach this 这是解决这个问题的一种方法
create a column that combines name-mtd as below 创建一个合并name-mtd的列,如下所示
df['pair'] = df['name']+' - '+df['mtd']
then use PeriodIndex to specify the period to group the column datatime on as shown below 然后使用PeriodIndex指定将列数据时间分组的时间,如下所示
res = df.groupby([pd.PeriodIndex(df.datetime.dt.round('5s'),freq='5S'),
'pair'])['pair'].count()
The output will be 输出将是
datetime pair
2017-09-07 00:00:10 profile/log - GET 1
2017-09-07 00:00:15 profile/log - PUT 1
2017-09-07 00:00:20 extras/dashboard - GET 1
unknown - PUT 1
2017-09-07 00:00:25 extras/dashboard - GET 1
extras/stats - GET 1
2017-09-07 00:00:30 extras/authz-profile/check - GET 1
2017-09-07 00:00:35 about - PUT 1
extras/fav - GET 1
2017-09-07 00:00:45 extras/store - GET 1
Name: pair, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.