在熊猫中获得平均每分钟

Question

There are already plenty of question on stack overflow regarding what i am asking but i have a small doubt and because of that i think my question is different. 关于我在问什么，堆栈溢出已经有很多问题，但是我有一个小疑问，因此我认为我的问题有所不同。 In my time series i want to get the average per minute. 在我的时间序列中，我希望获得每分钟的平均值。 My time series is something like below:- 我的时间序列如下：

      time                         duration
2018-08-26T14:00:00.000Z           0.22
2018-08-26T14:00:00.000Z           0.23
2018-08-26T14:00:00.000Z           2.05
2018-08-26T14:00:00.000Z           2.5
2018-08-26T14:00:00.000Z           3.0
2018-08-26T14:00:01.000Z           30.4 
2018-08-26T14:00:01.000Z           30.4 
2018-08-26T14:00:01.000Z           30.4 
2018-08-26T14:00:02.000Z           30.4 
2018-08-26T14:00:02.000Z           30.4 
2018-08-26T14:00:03.000Z           30.4 
.....
2018-08-26T14:01:03.000Z           30.4 
2018-08-26T14:01:03.000Z           30.4 
2018-08-26T14:02:03.000Z           30.4 
2018-08-26T14:02:03.000Z           30.4

As the data is from elastic search i am having multiple observation from the same second. 由于数据来自弹性搜索，因此我从同一秒开始有多次观察。 From Multiple i mean i have may be 100 observation from one second time stamp. 从倍数开始，我的意思是说从一秒钟的时间戳中可以观察到100次。

I am using the below code to perform the average duration per minute which i got from Group index by minute and compute average 我正在使用下面的代码执行每分钟的平均持续时间，这是我从组索引中获取的，并按分钟计算平均值

df.index = pd.DatetimeIndex(df.time)

df.groupby([df.index.values.astype('<M8[m]')])['duration'].mean()

I am getting my output like below 我得到我的输出如下

2018-08-26 14:00:00    0.151470
2018-08-26 14:01:00    0.144745
2018-08-26 14:02:00    0.147503
2018-08-26 14:03:00    0.156921
2018-08-26 14:04:00    0.142978
2018-08-26 14:05:00    0.167170
2018-08-26 14:06:00    0.156233
2018-08-26 14:07:00    0.140044
2018-08-26 14:08:00    0.135376
2018-08-26 14:09:00    0.161247
2018-08-26 14:10:00    0.134211
2018-08-26 14:11:00    0.179065
2018-08-26 14:12:00    0.145470
2018-08-26 14:13:00    0.145623
2018-08-26 14:14:00    0.139927
2018-08-26 14:15:00    0.138283
2018-08-26 14:16:00    0.137545
2018-08-26 14:17:00    0.140346

I just want to make sure if i am doing this right because i am having multiple instance for one second and I am afraid if its is considering all of it or not. 我只想确定我是否做对了，因为我在一秒钟内有多个实例，因此我担心它是否正在考虑所有实例。

I will appreciate any kind of help here. 在这里，我将不胜感激。

Answer 1

This is what .resample() is for: 这是.resample()的作用：

resample() is a time-based groupby, followed by a reduction method on each of its groups. resample()是一个基于时间的分组依据，后面是每个分组的归约方法。

Verifiable example: 可验证的示例：

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> # millisecond frequency, 100000 periods starting 2017-01-01 00:00:00
>>> idx = pd.date_range(start='2017', periods=100000, freq='ms')
>>> idx.min(), idx.max()
(Timestamp('2017-01-01 00:00:00', freq='L'), Timestamp('2017-01-01 00:01:39.999000', freq='L'))

>>> s = pd.Series(np.random.randn(len(idx)), index=idx)
>>> s.resample('s').mean().head()
2017-01-01 00:00:00    0.009352
2017-01-01 00:00:01    0.061978
2017-01-01 00:00:02   -0.011118
2017-01-01 00:00:03    0.046698
2017-01-01 00:00:04   -0.008205

Manual inspection should match: 手动检查应符合：

>>> s.loc['2017-01-01 00:00:00'].mean()
0.00935201762323959
>>> s.loc['2017-01-01 00:00:01'].mean()
0.061978455181838

在熊猫中获得平均每分钟

问题描述

1 个解决方案

解决方案1
2 2018-09-04 18:42:05

在熊猫中获得平均每分钟

问题描述

1 个解决方案

解决方案1 2 2018-09-04 18:42:05

解决方案1
2 2018-09-04 18:42:05