[英]Python Pandas - Time/Date, Averaging Data and plotting to a graph
I am having trouble with preparing data so that it is suitable to be plotted to a graph.我在准备数据以使其适合绘制成图形时遇到问题。 This is my dataframe:
这是我的数据框:
Date Lane Lane Name Direction DirectionName \
0 2018-02-02 00:00:03.000 6 SB_NS 2 South
1 2018-02-02 00:00:22.010 5 SB_MID 2 South
2 2018-02-02 00:00:22.020 4 SB_OS 2 South
3 2018-02-02 00:00:36.040 6 SB_NS 2 South
4 2018-02-02 00:00:49.070 6 SB_NS 2 South
... ... ... ... ... ...
503763 2018-02-27 23:59:00.090 2 NB_MID 1 North
503764 2018-02-27 23:59:29.090 6 SB_NS 2 South
503765 2018-02-27 23:59:32.050 4 SB_OS 2 South
503766 2018-02-27 23:59:33.070 6 SB_NS 2 South
503767 2018-02-27 23:59:58.050 1 NB_NS 1 North
Speed (mph) Headway (s) Gap (s) Flags Flag Text
0 38.525 NaN NaN 5 Friday
1 32.310 NaN NaN 5 Friday
2 44.739 NaN NaN 5 Friday
3 33.554 NaN NaN 5 Friday
4 39.768 12.300 11.847 5 Friday
... ... ... ... ... ...
503763 32.932 4.415 3.833 2 Tuesday
503764 29.825 65.500 64.700 2 Tuesday
503765 29.205 236.000 235.848 2 Tuesday
503766 37.283 3.330 3.462 2 Tuesday
503767 36.661 76.000 75.669 2 Tuesday
[503768 rows x 10 columns]
It is traffic data.是交通数据。 Each row is a single observation of traffic at a point in time.
每行是一个时间点的单个流量观察。 Flags is simply the day of the week.
标志只是一周中的一天。 The data has been collected on every Tuesday and Friday of the month.
数据是在每个月的每周二和周五收集的。 So the dataframe contains 8 different dates, 4 Tuesdays, 4 Fridays
所以数据框包含 8 个不同的日期,4 个星期二,4 个星期五
I want to plot two graphs.我想绘制两个图形。 One graph will show only South data, the other will show only north data.
一个图表将仅显示南数据,另一个将仅显示北数据。 Both graphs should show average traffic volume for each hour of the day on a selected date of my choice ( 2018-02-02 for example).
两个图表都应显示我选择的选定日期(例如 2018-02-02)一天中每小时的平均流量。
So to clarify, here is what the output should be:所以为了澄清,这里是输出应该是什么:
Two bar plots, one for the North direction and one for South, for 2018-02-02两张条形图,一张北向,一张南向,用于 2018-02-02
Each bar plot should show the average traffic volume for each hour of the day.每个条形图应显示一天中每小时的平均交通量。
I am just a bit confused about how to only collect data for a single date and how to collect the average traffic flow for that date per hour.我只是对如何仅收集单个日期的数据以及如何收集该日期每小时的平均流量感到有些困惑。
So far, I have grouped by date/hour and counted the total... As shown below.到目前为止,我已经按日期/小时分组并统计了总数......如下所示。
Date DirectionName count
0 2018-02-02 00:00:00 North 212
1 2018-02-02 00:00:00 South 250
2 2018-02-02 01:00:00 North 130
3 2018-02-02 01:00:00 South 137
4 2018-02-02 02:00:00 North 76
... ... ... ...
379 2018-02-27 21:00:00 South 801
380 2018-02-27 22:00:00 North 425
381 2018-02-27 22:00:00 South 511
382 2018-02-27 23:00:00 North 233
383 2018-02-27 23:00:00 South 301
The problem is, the count is obviously not an average per hour.问题是,计数显然不是每小时的平均值。 This method also uses every single date, when I only want to use a specific date, such as 2018-02-02.
当我只想使用特定日期(例如 2018-02-02)时,此方法也使用每个日期。
This is the code for my current method;这是我当前方法的代码;
df.Date=pd.to_datetime(df.Date)
df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')
Some advice / Clarification would be greatly appreciated :)一些建议/澄清将不胜感激:)
Filter the data using .loc to only a single day, then you are correctly counting the number of rows per hour, after which simply group by day and get the average value.使用.loc将数据过滤到仅一天,然后您正确地计算每小时的行数,之后只需按天分组并获得平均值。
df.Date=pd.to_datetime(df.Date)
df = df[df["Date"] == "2018-02-02"]
hourly = df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')
daily = df.groupby([df.Date, df.DirectionName]).mean()
This gives you the average amount of traffic per hour on that day.这为您提供了当天每小时的平均流量。 It returns a single number - I am not sure if that is what you want.
它返回一个数字——我不确定这是否是你想要的。 If not, do you want one of the other fields averaged?
如果不是,您是否希望对其他字段之一求平均值? eg average speed per hour?
例如每小时平均速度? for every single hour of the day?
一天中的每一小时?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.