简体   繁体   English

如何分析时间数据作为熊猫中一天中时间的函数

[英]how to analyze time-series data as a function of the time of day in pandas

Suppose I have a random sample of data collected every 1 minute for a month. 假设我每个月每1分钟收集一次随机数据样本。 Then suppose I want to use pandas to analyze this data as a function of the time of day, and see the differences between a weekend and weekday. 然后,假设我想使用pandas根据一天中的时间分析此数据,并查看周末和工作日之间的差异。 I can do this in pandas if my index is a DateTimeIndex by calculating the time of day as a 0-1 decimal value, manually binning the results in intervals of 10 minutes (or whatever) and then plotting the results using the bins column to actually calculate averages over the time intervals of the day, and then manually setting my tick positions and labels into something understandable. 如果我的索引是DateTimeIndex ,我可以在pandas执行此操作,方法是将一天中的时间计算为0-1十进制值,以10分钟(或类似时间)的间隔手动将结果进行分bins ,然后使用bins列将结果绘制为实际值计算一天的时间间隔内的平均值,然后手动将我的刻度位置和标签设置为可以理解的值。

However, this feels a little bit hacky and I am wondering if there are built-in pandas functions to achieve this same kind of analysis. 但是,这感觉有点棘手,我想知道是否有内置的熊猫函数来实现这种分析。 I haven't been able to find them so far. 到目前为止,我还没有找到它们。

dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)

# set up a column to make the time of day a value from 0 to 1
df['day_fraction'] = (df.index.hour + df.index.minute / 60) / 24

# bin the time of day to analyze data during 10 minute intervals
df['day_bins'] = df['day_fraction'] - df['day_fraction'] % (1 / 24 / 6)

ax = df.plot('day_fraction', 'vals', marker='o', color='pink', alpha=0.05, label='')
df.groupby('day_bins')['vals'].mean().plot(ax=ax, label='average')
df[df.index.weekday < 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekday average')
df[df.index.weekday >= 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekend average')

xlabels = [label if label else 12 for label in [i % 12 for i in range(0, 25, 2)]]
xticks = [i / 24 for i in range(0, 25, 2)]
ax.set_xticks(xticks)
ax.set_xticklabels(xlabels)
ax.set_xlabel('time of day')
ax.legend()

在此处输入图片说明

I think you just need to use groupby with a lot of the built in .dt accessors. 我认为您只需要将groupby与许多内置的.dt访问器一起使用。 Group based on weekday or weekend and then form bins every 10 minutes (with .floor ) and calculate the mean. 根据工作日或周末进行分组,然后每10分钟形成一个bin(带有.floor )并计算平均值。

Setup 设定

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)

Plot 情节

df1 = (df.groupby([np.where(df.index.weekday < 5, 'weekday', 'weekend'),
                   df.index.floor('10min').time])
         .mean()
         .rename(columns={'vals': 'average'}))

fig, ax = plt.subplots(figsize=(12,7))
df1.unstack(0).plot(ax=ax)  
# Plot Full Average
df.groupby(df.index.floor('10min').time).mean().rename(columns={'vals': 'average'}).plot(ax=ax)
plt.show()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM