being new to pandas I am lost in the zillions of smooth random data generation examples.
What I have been trying to achieve is to create graphs using bokeh
with rolling time window. I want x-axis to be (resampled or whatever) timestamp and 3 lines displaying max
, min
and mean
values for let's say rolling 15 second time window for the duration
field.
The joy stops before starting... I have tried to apply quite many examples without making progress or learning much.
The code below
d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], \
format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
print (d2.head())
print (d2.rolling("15s", min_periods=1).mean().head())
print (d2.rolling("15s", min_periods=1).std().head())
print (d2.rolling("15s", min_periods=1).min().head())
print (d2.rolling("15s", min_periods=1).max().head())
produces an exception:
ValueError: window must be an integer
If I could get the rolling stuff work, I'd probably could manage the bokeh
side.
Any pointers supporting to make this happen are highly appreciated!
I have this data in csv:
ts_send,endpoint,duration, 2017-01-19 09:03:28.600,/api/sig,1.0 2017-01-19 09:03:29.760,/api/sig,0.5 2017-01-19 09:04:51.210,/api/sig,0.508 2017-01-19 09:04:52.410,/api/sig,0.574 2017-01-19 09:09:32.854,/api/sig,1.0 2017-01-19 09:09:36.776,/api/sig,0.637 2017-01-19 09:14:14.207,/api/sig,0.672 2017-01-19 09:14:16.906,/api/sig,0.533 2017-01-19 11:49:34.939,/api/sig,1.0 2017-01-19 11:49:38.709,/api/sig,0.529 2017-01-19 12:19:01.668,/api/sig,1.0 2017-01-19 12:19:05.559,/api/item,0.169 2017-01-19 12:19:05.559,/api/item,0.102 2017-01-19 12:19:05.559,/api/item,0.44 2017-01-19 12:19:05.585,/api/item,0.173 2017-01-19 12:19:06.633,/api/sig,0.564 2017-01-19 12:27:05.712,/api/sig,0.574 2017-01-19 12:27:08.370,/api/sig,0.497 2017-01-19 12:27:43.319,/api/sig,0.561 2017-01-19 12:27:45.873,/api/sig,0.508 2017-01-19 12:46:15.454,/api/sig,1.0 2017-01-19 12:46:20.409,/api/item,0.173 2017-01-19 12:46:20.427,/api/item,0.163 2017-01-19 12:46:20.457,/api/item,0.169 2017-01-19 12:46:20.474,/api/item,0.162 2017-01-19 12:46:20.618,/api/item,0.209 2017-01-19 12:46:20.642,/api/item,0.172 2017-01-19 12:46:20.695,/api/item,0.26 2017-01-19 12:46:20.698,/api/item,0.193 2017-01-19 12:46:20.788,/api/item,0.193 2017-01-19 12:46:20.822,/api/item,0.232 2017-01-19 12:46:20.873,/api/item,0.164 2017-01-19 12:46:20.875,/api/item,0.142 2017-01-19 12:46:20.905,/api/item,0.356 2017-01-19 12:46:20.998,/api/item,0.199
The timestamp ts_send
is millisecond precission. There are times when no events are recorded and there are times when there multiple events on a single millisecond.
This will work if your time series is the index. Add this before you run your code:
d2.set_index('ts_send', inplace=True)
Thanks to kind members Boud and Goyo I was able to move forward.
The code produces what I needed:
d2 = pd.read_csv(input_file, delimiter=",")
d2["ts_send"] = pd.to_datetime(d2["ts_send"], format="%Y-%m-%d %H:%M:%S.%f", exact=True, utc=True)
d2.index = pd.DatetimeIndex(d2.ts_send, inplace=True)
d3 = d2.sort_index()
d3.drop(d3.columns[0],axis=1,inplace=True)
print (d3.index.is_monotonic_increasing)
print (d3.head())
print (d3.rolling("5s", min_periods=1).mean())
print (d3.rolling("5s", min_periods=1).std())
print (d3.rolling("5s", min_periods=1).min())
print (d3.rolling("5s", min_periods=1).max())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.