简体   繁体   中英

how to analyze time-series data as a function of the time of day in pandas

Suppose I have a random sample of data collected every 1 minute for a month. Then suppose I want to use pandas to analyze this data as a function of the time of day, and see the differences between a weekend and weekday. I can do this in pandas if my index is a DateTimeIndex by calculating the time of day as a 0-1 decimal value, manually binning the results in intervals of 10 minutes (or whatever) and then plotting the results using the bins column to actually calculate averages over the time intervals of the day, and then manually setting my tick positions and labels into something understandable.

However, this feels a little bit hacky and I am wondering if there are built-in pandas functions to achieve this same kind of analysis. I haven't been able to find them so far.

dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)

# set up a column to make the time of day a value from 0 to 1
df['day_fraction'] = (df.index.hour + df.index.minute / 60) / 24

# bin the time of day to analyze data during 10 minute intervals
df['day_bins'] = df['day_fraction'] - df['day_fraction'] % (1 / 24 / 6)

ax = df.plot('day_fraction', 'vals', marker='o', color='pink', alpha=0.05, label='')
df.groupby('day_bins')['vals'].mean().plot(ax=ax, label='average')
df[df.index.weekday < 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekday average')
df[df.index.weekday >= 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekend average')

xlabels = [label if label else 12 for label in [i % 12 for i in range(0, 25, 2)]]
xticks = [i / 24 for i in range(0, 25, 2)]
ax.set_xticks(xticks)
ax.set_xticklabels(xlabels)
ax.set_xlabel('time of day')
ax.legend()

在此处输入图片说明

I think you just need to use groupby with a lot of the built in .dt accessors. Group based on weekday or weekend and then form bins every 10 minutes (with .floor ) and calculate the mean.

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)

Plot

df1 = (df.groupby([np.where(df.index.weekday < 5, 'weekday', 'weekend'),
                   df.index.floor('10min').time])
         .mean()
         .rename(columns={'vals': 'average'}))

fig, ax = plt.subplots(figsize=(12,7))
df1.unstack(0).plot(ax=ax)  
# Plot Full Average
df.groupby(df.index.floor('10min').time).mean().rename(columns={'vals': 'average'}).plot(ax=ax)
plt.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM