[英]How to calculate the maximum 15-min sum from a Pandas Series or Dataframe
Pandas newbie here. 熊猫新手在这里。 I have a dataset which contains traffic counts with time stamps. 我有一个数据集,其中包含带有时间戳的流量计数。 I want to know which 15-min interval has the most cumulative sum of counts, and the value of this sum. 我想知道哪个15分钟间隔的计数总和最大,以及该总和的值。
Data might look something like this: 数据可能看起来像这样:
import random
ts = pd.Series(range(1000),index=random.sample(pd.date_range('2015-02-01 06:00:00',periods=3000,freq='1min'),1000)).sort_index()
2015-02-01 06:06:00 314
2015-02-01 06:08:00 154
2015-02-01 06:09:00 914
2015-02-01 06:13:00 84
2015-02-01 06:18:00 880
2015-02-01 06:22:00 912
2015-02-01 06:28:00 410
2015-02-01 06:32:00 391
2015-02-01 06:34:00 270
2015-02-01 06:35:00 984
2015-02-01 06:36:00 271
2015-02-01 06:37:00 722
2015-02-01 06:38:00 748
2015-02-01 06:40:00 313
2015-02-01 06:42:00 277
2015-02-01 06:43:00 604
2015-02-01 06:49:00 888
2015-02-01 06:50:00 943
2015-02-01 06:51:00 124
2015-02-01 06:52:00 806
Is there a way to do this in Pandas? 熊猫有办法做到这一点吗?
a simple solution without using pandas native functions 一个不使用熊猫本机函数的简单解决方案
from datetime import timedelta
start = ts.index[0]
end = ts.index[len(ts)-1]
dur = timedelta(minutes=15)
max_val = 0
while start < end:
cum_sum = ts[start : start+dur].sum()
if cum_sum > max_val:
max_val = cum_sum
max_seg = (start, start+dur)
start = star+dur
print max_val
print max_seg
This is what I came up with: 这是我想出的:
def find_peak_15_minutes(data_frame, column):
max_sum = 0
start_of_max15 = 0
for start in data_frame[column].values:
series_sum = data_frame[column][data_frame[column].between(start, start + 15)].count()
if series_sum > max_sum:
max_sum = series_sum
start_of_max15 = start
return (start_of_max15, max_sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.