[英]How to detect and filter peaks over time series data?
I have a pandas dataframe of user logins like this: 我有一个这样的用户登录熊猫数据框:
id datetime_login
646 2017-03-15 15:30:25
611 2017-04-14 11:38:30
611 2017-05-15 08:49:01
651 2017-03-15 15:30:25
611 2017-03-15 15:30:25
652 2017-03-08 14:03:56
652 2017-03-08 14:03:56
652 2017-03-15 15:30:25
654 2017-03-15 15:30:25
649 2017-03-15 15:30:25
902 2017-09-09 15:00:00
902 2017-02-13 16:39:53
902 2017-11-15 12:00:00
902 2017-11-15 12:00:00
902 2017-09-09 15:00:00
902 2017-05-15 08:48:47
902 2017-11-15 12:00:00
After plotting the logins: 绘制登录名后:
df.datetime_login = df.datetime_login.apply(lambda x: str(x)[:10])
df.datetime_login = df.datetime_login.apply(lambda x: date(int(x[:4]), int(x[5:7]), int(x[8:10])))
fig, ax = subplots()
df.datetime_login.value_counts().sort_index().plot(figsize=(25,10), colormap='jet',fontsize=20)
How can I detect in my plot the peaks in the time series data? 如何在图表中检测时间序列数据中的峰值?
How can I filter into an array the peaks in my time series data? 如何将时间序列数据中的峰值过滤到阵列中?
I tried to: 我试过了:
import peakutils
indices = peakutils.indexes(df, thres=0.4, min_dist=1000)
print(indices)
However, I got: 但是,我得到了:
TypeError: unsupported operand type(s) for -: 'datetime.date' and 'int'
However, I got: 但是,我得到了:
Where df.datetime_login.value_counts().sort_index().plot(figsize=(25,10), colormap='jet',fontsize=20)
plots: 其中df.datetime_login.value_counts().sort_index().plot(figsize=(25,10), colormap='jet',fontsize=20)
绘制:
Let's try the following, you need to use the series returned by value_counts
instead of your original df, peakutils.indexes
: 让我们尝试以下操作,您需要使用value_counts
返回的系列而不是原始的df peakutils.indexes
:
df_counts = df.datetime_login.value_counts().sort_index()
df_counts[peakutils.indexes(df_counts, thres=0.4, min_dist=1000)]
Output: 输出:
2017-03-15 15:30:25 6
Name: datetime_login, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.