简体   繁体   English

Python Pandas - 时间/日期、平均数据和绘制图形

[英]Python Pandas - Time/Date, Averaging Data and plotting to a graph

I am having trouble with preparing data so that it is suitable to be plotted to a graph.我在准备数据以使其适合绘制成图形时遇到问题。 This is my dataframe:这是我的数据框:

                          Date  Lane Lane Name  Direction DirectionName  \
0      2018-02-02 00:00:03.000     6     SB_NS          2         South   
1      2018-02-02 00:00:22.010     5    SB_MID          2         South   
2      2018-02-02 00:00:22.020     4     SB_OS          2         South   
3      2018-02-02 00:00:36.040     6     SB_NS          2         South   
4      2018-02-02 00:00:49.070     6     SB_NS          2         South   
...                        ...   ...       ...        ...           ...   
503763 2018-02-27 23:59:00.090     2    NB_MID          1         North   
503764 2018-02-27 23:59:29.090     6     SB_NS          2         South   
503765 2018-02-27 23:59:32.050     4     SB_OS          2         South   
503766 2018-02-27 23:59:33.070     6     SB_NS          2         South   
503767 2018-02-27 23:59:58.050     1     NB_NS          1         North   

        Speed (mph)  Headway (s)  Gap (s)  Flags Flag Text  
0            38.525          NaN      NaN      5    Friday  
1            32.310          NaN      NaN      5    Friday  
2            44.739          NaN      NaN      5    Friday  
3            33.554          NaN      NaN      5    Friday  
4            39.768       12.300   11.847      5    Friday  
...             ...          ...      ...    ...       ...  
503763       32.932        4.415    3.833      2   Tuesday  
503764       29.825       65.500   64.700      2   Tuesday  
503765       29.205      236.000  235.848      2   Tuesday  
503766       37.283        3.330    3.462      2   Tuesday  
503767       36.661       76.000   75.669      2   Tuesday  

[503768 rows x 10 columns]

It is traffic data.是交通数据。 Each row is a single observation of traffic at a point in time.每行是一个时间点的单个流量观察。 Flags is simply the day of the week.标志只是一周中的一天。 The data has been collected on every Tuesday and Friday of the month.数据是在每个月的每周二和周五收集的。 So the dataframe contains 8 different dates, 4 Tuesdays, 4 Fridays所以数据框包含 8 个不同的日期,4 个星期二,4 个星期五

I want to plot two graphs.我想绘制两个图形。 One graph will show only South data, the other will show only north data.一个图表将仅显示南数据,另一个将仅显示北数据。 Both graphs should show average traffic volume for each hour of the day on a selected date of my choice ( 2018-02-02 for example).两个图表都应显示我选择的选定日期(例如 2018-02-02)一天中每小时的平均流量。

So to clarify, here is what the output should be:所以为了澄清,这里是输出应该是什么:

  • Two bar plots, one for the North direction and one for South, for 2018-02-02两张条形图,一张北向,一张南向,用于 2018-02-02

  • Each bar plot should show the average traffic volume for each hour of the day.每个条形图应显示一天中每小时的平均交通量。

I am just a bit confused about how to only collect data for a single date and how to collect the average traffic flow for that date per hour.我只是对如何仅收集单个日期的数据以及如何收集该日期每小时的平均流量感到有些困惑。

So far, I have grouped by date/hour and counted the total... As shown below.到目前为止,我已经按日期/小时分组并统计了总数......如下所示。

Date    DirectionName   count
0   2018-02-02 00:00:00 North   212
1   2018-02-02 00:00:00 South   250
2   2018-02-02 01:00:00 North   130
3   2018-02-02 01:00:00 South   137
4   2018-02-02 02:00:00 North   76
... ... ... ...
379 2018-02-27 21:00:00 South   801
380 2018-02-27 22:00:00 North   425
381 2018-02-27 22:00:00 South   511
382 2018-02-27 23:00:00 North   233
383 2018-02-27 23:00:00 South   301

The problem is, the count is obviously not an average per hour.问题是,计数显然不是每小时的平均值。 This method also uses every single date, when I only want to use a specific date, such as 2018-02-02.当我只想使用特定日期(例如 2018-02-02)时,此方法也使用每个日期。

  • How do I change my current method to show an average per hour instead of total per hour?如何更改我当前的方法以显示每小时的平均值而不是每小时的总数?
  • How do I change my current method to show ONLY a specific date?如何更改我当前的方法以仅显示特定日期?
  • Is my current method unsuitable / is there a better method?我目前的方法不合适/有更好的方法吗?

This is the code for my current method;这是我当前方法的代码;

df.Date=pd.to_datetime(df.Date)
df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')

Some advice / Clarification would be greatly appreciated :)一些建议/澄清将不胜感激:)

Filter the data using .loc to only a single day, then you are correctly counting the number of rows per hour, after which simply group by day and get the average value.使用.loc将数据过滤到仅一天,然后您正确地计算每小时的行数,之后只需按天分组并获得平均值。

df.Date=pd.to_datetime(df.Date)
df = df[df["Date"] == "2018-02-02"]
hourly = df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')
daily = df.groupby([df.Date, df.DirectionName]).mean()

This gives you the average amount of traffic per hour on that day.这为您提供了当天每小时的平均流量。 It returns a single number - I am not sure if that is what you want.它返回一个数字——我不确定这是否是你想要的。 If not, do you want one of the other fields averaged?如果不是,您是否希望对其他字段之一求平均值? eg average speed per hour?例如每小时平均速度? for every single hour of the day?一天中的每一小时?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM