简体   繁体   English

时间序列数据:每天的bin数据,然后按星期几绘制

[英]Time series data: bin data to each day, then plot by day of week

I have a very simple pandas DataFrame with the following format: 我有一个非常简单的pandas DataFrame,格式如下:

date        P1      P2      day
2015-01-01  190     1132    Thursday
2015-01-01  225     1765    Thursday
2015-01-01  3427    29421   Thursday
2015-01-01  945     7679    Thursday
2015-01-01  1228    9537    Thursday
2015-01-01  870     6903    Thursday
2015-01-02  785     4768    Friday
2015-01-02  1137    7065    Friday
2015-01-02  175     875     Friday

where P1 and P2 are different parameters of interest. 其中P1和P2是不同的感兴趣参数。 I would like to create a bar graph that looks like this for each P1 and P2. 我想为每个P1和P2创建一个看起来像这样的条形图。 As shown in the data, I have several values for each day. 如数据所示,我每天都有几个值。 I would like to average the given values for a given day, then plot against day of week (so that the averaged value for Monday week 1 gets added to Monday week 2, etc.). 我想对给定日期的给定值进行平均,然后根据星期几进行绘图(以便将星期一第1周的平均值添加到星期一第2周等)。

I'm new to python and my current method is quite nasty, involving several loops. 我是python的新手,我当前的方法非常讨厌,涉及几个循环。 I currently have two dedicated portions of code - one to do the averages and another to go through each day of the week one at a time and tally results for plotting. 我目前有两个专门的代码部分 - 一个用于执行平均值,另一个用于每周一次执行一次,并计算绘图的结果。 Is there a cleaner way to do this? 有更清洁的方法吗?

Seems like you're looking for: 好像你在寻找:

df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)

and

df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)

Full example: 完整示例:

import numpy as np
import pandas as pd

days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']
day = np.random.choice(days, size=1000)
p1, p2 = np.random.randint(low=0, high=2500, size=(2, 1000))
df = pd.DataFrame({'P1': p1, 'P2': p2, 'day': day})

# Helps for ordering of day-of-week in plot
df['day'] = pd.Categorical(df.day, categories=days)

# %matplotlib inline

df[['day', 'P1']].groupby('day').mean().plot(kind='bar', legend=None)
df[['day', 'P2']].groupby('day').mean().plot(kind='bar', legend=None)

Note that on your existing DataFrame the call to pd.Categorical gets you a custom sort key as shown here . 需要注意的是在现有数据框调用pd.Categorical让你自定义排序键显示在这里

Result (for P1): 结果(对于P1):

在此输入图像描述

Update 更新

In your comment you asked, 在你的评论中你问过,

Does groupby find the average of a given parameter (say P1), over all instances of the group? groupby是否在组的所有实例中找到给定参数(比如P1)的平均值? For instance, if I have 8 Mondays, is the resulting value the average of all datapoints that occurred on Monday? 例如,如果我有8个星期一,结果值是星期一发生的所有数据点的平均值吗? An added hurdle here is that I have unreliable sampling for the data. 这里增加的障碍是我对数据的采样不可靠。 If I had a Monday with 10 samples and a Monday with 1, simply averaging all 11 values would drown out the Monday with a small sample size. 如果我的星期一有10个样本,星期一有1个,那么简单地平均所有11个值将在周一淹没,样本量很小。 Thus, I would like to average all values for a given date before considering the day of week. 因此,我想在考虑星期几之前平均给定日期的所有值。

Yes, the groupby above would find the average for all instances. 是的,上面的groupby会找到所有实例的平均值。 Here's how you could get to that "double" averaging: 以下是你可以达到“双倍”平均值的方法:

# for P1; replace P2 with P1 to find P2 avgs.
df.drop('P2', axis=1).groupby(['date', 'day']).mean()\
    .reset_index().groupby('day').mean().plot(kind='bar', legend=None)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用前一周(天)或前一天的数据填充 pandas 时间序列中的缺失数据? - fill missing data in pandas time series with data from the previous week(day) or day? 如何用 python 中前一周(天)的同一天和同一时间的值来估算时间序列数据中的缺失值 - How to impute missing value in time series data with the value of the same day and time from the previous week(day) in python 按星期几将不同的颜色标记添加到Pandas时间序列图中 - Add different color markers by day of week to a Pandas time series plot 如何将具有日分辨率的Numpy时间序列库存数据转换为月或周分辨率 - How to convert numpy time series stock data with day resolution to month or week resolution 如何将每一天的时间序列叠加在一个 plot 上 - How to overlay time series from each day on one plot 绘制按小时和星期几分组的时间序列 - plotting time series grouped by hour and day of week 根据 pandas 中每一天的周数据获取 cumsum 数据库? - get cumsum data base on week data for each day in pandas? 按星期Python分类数据 - Categorize Data by Day of Week Python 如果 D 在开始日期和结束日期之间,则创建一个时间序列,对每天 D 的数据求和 - Create a time series that sums data on each day D, if D is between the start date and the end date 将星期几的时间与星期几的时间进行比较 - Comparing time of day on day of week to time of day on day of week
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM