[英]seaborn : plotting time on x-axis
I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):我正在处理一个仅包含日期时间对象的数据集,并且我检索了星期几并在这样的单独列中重新格式化时间(转换函数包括在下面):
datetime day_of_week time_of_day
0 2021-06-13 12:56:16 Sunday 20:00:00
5 2021-06-13 12:56:54 Sunday 20:00:00
6 2021-06-13 12:57:27 Sunday 20:00:00
7 2021-07-16 18:55:42 Friday 20:00:00
8 2021-07-16 18:56:03 Friday 20:00:00
9 2021-06-04 18:42:06 Friday 20:00:00
10 2021-06-04 18:49:05 Friday 20:00:00
11 2021-06-04 18:58:22 Friday 20:00:00
What I would like to do is create a kde
plot with x-axis = time_of_day
(spanning 00:00:00
to 23:59:59
), y-axis
to be the count of each day_of_week
at each hour of the day, and hue = day_of_week
.我希望做的就是创建一个
kde
与情节x-axis = time_of_day
(跨越00:00:00
至23:59:59
), y-axis
是每个计数day_of_week
在一天中的每个小时, hue = day_of_week
。 In essence, I'd have seven different distributions representing occurrences during each day of the week.本质上,我有七种不同的分布来代表一周中每一天的发生。
Here's a sample of the data and my code.这是数据示例和我的代码。 Any help would be appreciated:
任何帮助,将不胜感激:
df = pd.DataFrame([
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:54',
'2021-06-13 12:56:54',
'2021-06-13 12:57:27',
'2021-07-16 18:55:42',
'2021-07-16 18:56:03',
'2021-06-04 18:42:06',
'2021-06-04 18:49:05',
'2021-06-04 18:58:22',
'2021-06-08 21:31:44',
'2021-06-09 02:14:30',
'2021-06-09 02:20:19',
'2021-06-12 18:05:47',
'2021-06-15 23:46:41',
'2021-06-15 23:47:18',
'2021-06-16 14:19:08',
'2021-06-17 19:08:17',
'2021-06-17 22:37:27',
'2021-06-21 23:31:32',
'2021-06-23 20:32:09',
'2021-06-24 16:04:21',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-08-31 21:38:07',
'2020-08-31 21:38:22',
'2020-08-31 21:38:42',
'2020-08-31 21:39:03',
], columns=['datetime'])
def convert_date(date):
return calendar.day_name[date.weekday()]
def convert_hour(time):
return time[:2]+':00:00'
df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)
Let's try:我们试试看:
datetime
column to_datetimedatetime
列转换为_datetimetime_of_day
to a single day (so comparisons function correctly).time_of_day
标准化为一天(因此比较功能正确)。 This makes it seem like all events occurred within the same day making plotting logic much simpler.HH:MM:SS
HH:MM:SS
import calendar
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates
# df = pd.DataFrame({...})
# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
df['datetime'].dt.time.astype(str))
# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')
# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
plt.tight_layout()
plt.show()
Note sample size is small here:注意这里的样本量很小:
If looking for count on y then maybe histplot is better:如果在 y 上寻找计数,那么histplot可能更好:
ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')
I would use Timestamp
of pandas straight away.我会立即使用熊猫的
Timestamp
。 By the way your convert_hour
function seems to do wrong.顺便说一下,您的
convert_hour
函数似乎做错了。 It gives time_of_the day
as 20:00:00 for all data.它为所有数据提供
time_of_the day
为 20:00:00。
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')
df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)
plt.figure(figsize=(8, 4))
for idx, day in enumerate(days):
sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)
The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.星期三的 kde 看起来有点奇怪,因为时间在 2 到 20 之间变化,因此图中的长尾从 -20 到 40。
Here is a simple code and using df.plot.kde
.这是一个简单的代码并使用
df.plot.kde
。
Added more data so that multiple values are present for each day_of_week
for kde to plot.添加了更多数据,以便 kde 绘制每个
day_of_week
多个值。 Simplified the code to remove functions.简化代码以删除功能。
df1 = pd.DataFrame([
'2020-09-01 16:39:03',
'2020-09-02 16:39:03',
'2020-09-03 16:39:03',
'2020-09-04 16:39:03',
'2020-09-05 16:39:03',
'2020-09-06 16:39:03',
'2020-09-07 16:39:03',
'2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.