简体   繁体   中英

pandas rolling max min mean

being new to pandas I am lost in the zillions of smooth random data generation examples.

What I have been trying to achieve is to create graphs using bokeh with rolling time window. I want x-axis to be (resampled or whatever) timestamp and 3 lines displaying max , min and mean values for let's say rolling 15 second time window for the duration field.

The joy stops before starting... I have tried to apply quite many examples without making progress or learning much.

The code below

d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], \ 
format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)

print (d2.head())
print (d2.rolling("15s", min_periods=1).mean().head())
print (d2.rolling("15s", min_periods=1).std().head())
print (d2.rolling("15s", min_periods=1).min().head())
print (d2.rolling("15s", min_periods=1).max().head())

produces an exception:

ValueError: window must be an integer

If I could get the rolling stuff work, I'd probably could manage the bokeh side.

Any pointers supporting to make this happen are highly appreciated!

I have this data in csv:

ts_send,endpoint,duration,
2017-01-19 09:03:28.600,/api/sig,1.0
2017-01-19 09:03:29.760,/api/sig,0.5
2017-01-19 09:04:51.210,/api/sig,0.508
2017-01-19 09:04:52.410,/api/sig,0.574
2017-01-19 09:09:32.854,/api/sig,1.0
2017-01-19 09:09:36.776,/api/sig,0.637
2017-01-19 09:14:14.207,/api/sig,0.672
2017-01-19 09:14:16.906,/api/sig,0.533
2017-01-19 11:49:34.939,/api/sig,1.0
2017-01-19 11:49:38.709,/api/sig,0.529
2017-01-19 12:19:01.668,/api/sig,1.0
2017-01-19 12:19:05.559,/api/item,0.169
2017-01-19 12:19:05.559,/api/item,0.102
2017-01-19 12:19:05.559,/api/item,0.44
2017-01-19 12:19:05.585,/api/item,0.173
2017-01-19 12:19:06.633,/api/sig,0.564
2017-01-19 12:27:05.712,/api/sig,0.574
2017-01-19 12:27:08.370,/api/sig,0.497
2017-01-19 12:27:43.319,/api/sig,0.561
2017-01-19 12:27:45.873,/api/sig,0.508
2017-01-19 12:46:15.454,/api/sig,1.0
2017-01-19 12:46:20.409,/api/item,0.173
2017-01-19 12:46:20.427,/api/item,0.163
2017-01-19 12:46:20.457,/api/item,0.169
2017-01-19 12:46:20.474,/api/item,0.162
2017-01-19 12:46:20.618,/api/item,0.209
2017-01-19 12:46:20.642,/api/item,0.172
2017-01-19 12:46:20.695,/api/item,0.26
2017-01-19 12:46:20.698,/api/item,0.193
2017-01-19 12:46:20.788,/api/item,0.193
2017-01-19 12:46:20.822,/api/item,0.232
2017-01-19 12:46:20.873,/api/item,0.164
2017-01-19 12:46:20.875,/api/item,0.142
2017-01-19 12:46:20.905,/api/item,0.356
2017-01-19 12:46:20.998,/api/item,0.199

The timestamp ts_send is millisecond precission. There are times when no events are recorded and there are times when there multiple events on a single millisecond.

This will work if your time series is the index. Add this before you run your code:

d2.set_index('ts_send', inplace=True)

Thanks to kind members Boud and Goyo I was able to move forward.

The code produces what I needed:

d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
d2.index = pd.DatetimeIndex(d2.ts_send, inplace=True)
d3 = d2.sort_index()
d3.drop(d3.columns[0],axis=1,inplace=True)

print (d3.index.is_monotonic_increasing)
print (d3.head())

print (d3.rolling("5s", min_periods=1).mean())
print (d3.rolling("5s", min_periods=1).std())
print (d3.rolling("5s", min_periods=1).min())
print (d3.rolling("5s", min_periods=1).max())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM