简体   繁体   English

seaborn:在 x 轴上绘制时间

[英]seaborn : plotting time on x-axis

I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):我正在处理一个仅包含日期时间对象的数据集,并且我检索了星期几并在这样的单独列中重新格式化时间(转换函数包括在下面):

    datetime            day_of_week time_of_day
0   2021-06-13 12:56:16 Sunday      20:00:00
5   2021-06-13 12:56:54 Sunday      20:00:00
6   2021-06-13 12:57:27 Sunday      20:00:00
7   2021-07-16 18:55:42 Friday      20:00:00
8   2021-07-16 18:56:03 Friday      20:00:00
9   2021-06-04 18:42:06 Friday      20:00:00
10  2021-06-04 18:49:05 Friday      20:00:00
11  2021-06-04 18:58:22 Friday      20:00:00

What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59 ), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week .我希望做的就是创建一个kde与情节x-axis = time_of_day (跨越00:00:0023:59:59 ), y-axis是每个计数day_of_week在一天中的每个小时, hue = day_of_week In essence, I'd have seven different distributions representing occurrences during each day of the week.本质上,我有七种不同的分布来代表一周中每一天的发生。

Here's a sample of the data and my code.这是数据示例和我的代码。 Any help would be appreciated:任何帮助,将不胜感激:

df = pd.DataFrame([
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:54',
    '2021-06-13 12:56:54',
    '2021-06-13 12:57:27',
    '2021-07-16 18:55:42',
    '2021-07-16 18:56:03',
    '2021-06-04 18:42:06',
    '2021-06-04 18:49:05',
    '2021-06-04 18:58:22',
    '2021-06-08 21:31:44',
    '2021-06-09 02:14:30',
    '2021-06-09 02:20:19',
    '2021-06-12 18:05:47',
    '2021-06-15 23:46:41',
    '2021-06-15 23:47:18',
    '2021-06-16 14:19:08',
    '2021-06-17 19:08:17',
    '2021-06-17 22:37:27',
    '2021-06-21 23:31:32',
    '2021-06-23 20:32:09',
    '2021-06-24 16:04:21',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-08-31 21:38:07',
    '2020-08-31 21:38:22',
    '2020-08-31 21:38:42',
    '2020-08-31 21:39:03',
], columns=['datetime'])

def convert_date(date):
    return calendar.day_name[date.weekday()]

def convert_hour(time):
    return time[:2]+':00:00'

df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)

Let's try:我们试试看:

  1. converting the datetime column to_datetimedatetime列转换为_datetime
  2. Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)day_of_week 代码创建一个分类列(因此分类排序功能正确)
  3. normalizing the time_of_day to a single day (so comparisons function correctly).time_of_day标准化为一天(因此比较功能正确)。 This makes it seem like all events occurred within the same day making plotting logic much simpler.这使得所有事件似乎都发生在同一天,从而使绘图逻辑变得更加简单。
  4. plot the kdeplot绘制kdeplot
  5. set the xaxis formatter to only display HH:MM:SS将 xaxis 格式化程序设置为仅显示HH:MM:SS
import calendar

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates


# df = pd.DataFrame({...})

# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
    df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
                                   df['datetime'].dt.time.astype(str))

# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')

# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
             pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))

plt.tight_layout()
plt.show()

Note sample size is small here:注意这里的样本量很小: kdeplot

If looking for count on y then maybe histplot is better:如果在 y 上寻找计数,那么histplot可能更好:

ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')

柱状图

I would use Timestamp of pandas straight away.我会立即使用熊猫的Timestamp By the way your convert_hour function seems to do wrong.顺便说一下,您的convert_hour函数似乎做错了。 It gives time_of_the day as 20:00:00 for all data.它为所有数据提供time_of_the day为 20:00:00。

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt



sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')

df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)

plt.figure(figsize=(8, 4))

for idx, day in enumerate(days):
    sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)

kdeplot

The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.星期三的 kde 看起来有点奇怪,因为时间在 2 到 20 之间变化,因此图中的长尾从 -20 到 40。

Here is a simple code and using df.plot.kde .这是一个简单的代码并使用df.plot.kde

Added more data so that multiple values are present for each day_of_week for kde to plot.添加了更多数据,以便 kde 绘制每个day_of_week多个值。 Simplified the code to remove functions.简化代码以删除功能。

df1 = pd.DataFrame([
    '2020-09-01 16:39:03',
    '2020-09-02 16:39:03',
    '2020-09-03 16:39:03',
    '2020-09-04 16:39:03',
    '2020-09-05 16:39:03',
    '2020-09-06 16:39:03',
    '2020-09-07 16:39:03',
    '2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()

Plots:情节: 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM