简体   繁体   English

将多级 X 轴添加到 matplotlib/Seaborn(月和年)

[英]Adding multi level X axis to matplotlib/Seaborn (month and year)

I am having difficulty adding a multi level axis with month and then year to my plot and I have been unable to find any answers anywhere.我很难在 plot 中添加带有月份和年份的多级轴,并且我无法在任何地方找到任何答案。 I have a dataframe which contains the upload date as a datetime dtype and then the year and month for each row.我有一个 dataframe ,其中包含作为 datetime dtype 的上传日期,然后是每一行的年月。 See Below:见下文:

    Upload Date Year    Month      DocID
0   2021-03-22  2021    March      DOC146984
1   2021-12-16  2021    December   DOC173111
2   2021-12-07  2021    December   DOC115350
3   2021-10-29  2021    October    DOC150149
4   2021-03-12  2021    March      DOC125480
5   2021-06-25  2021    June       DOC101062
6   2021-05-03  2021    May        DOC155916
7   2021-11-14  2021    November   DOC198519
8   2021-03-20  2021    March      DOC159523
9   2021-07-19  2021    July       DOC169328
10  2021-04-13  2021    April      DOC182660
11  2021-10-08  2021    October    DOC176871
12  2021-09-19  2021    September  DOC185854
13  2021-05-16  2021    May        DOC192329
14  2021-06-29  2021    June       DOC142190
15  2021-11-30  2021    November   DOC140231
16  2021-11-12  2021    November   DOC145392
17  2021-11-10  2021    November   DOC178159
18  2021-11-06  2021    November   DOC160932
19  2021-06-16  2021    June       DOC131448

What I am trying to achieve is to build a bar chart which has the count for number of documents in each month and year.我想要实现的是构建一个条形图,其中包含每个月和每年的文档数量。 The graph would look something like this:该图看起来像这样:

这里的主要内容是月份,然后是下面的年份作为第二级

The main thing is that the x axis is split by each month and then further by each year, rather than me labelling each column with month and year (eg 'March 2021').主要的是 x 轴按月拆分,然后按年拆分,而不是我用月份和年份标记每一列(例如“2021 年 3 月”)。 However I can't figure out how to achieve this.但是我不知道如何实现这一点。 I've tried using a countplot but it only allows me to choose month or year (See Below).我试过使用计数图,但它只允许我选择月份或年份(见下文)。 I have also tried groupby but the end product is always the same.我也尝试过 groupby,但最终产品总是一样的。 Any Ideas?有任何想法吗?

在此处输入图像描述

This is using randomly generated data, see the code to replicate below:这是使用随机生成的数据,请参阅下面要复制的代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from datetime import date, timedelta
from random import choices
np.random.seed(42)
  
# initializing dates ranges 
test_date1, test_date2 = date(2020, 1, 1), date(2021, 6, 30)
  
# initializing K
K = 2000
  
res_dates = [test_date1]
  
# loop to get each date till end date
while test_date1 != test_date2:
    test_date1 += timedelta(days=1)
    res_dates.append(test_date1)
  
# random K dates from pack
res = choices(res_dates, k=K)

# Generating dataframe
df = pd.DataFrame(res, columns=['Upload Date'])

# Generate other columns
df['Upload Date'] = pd.to_datetime(df['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
df['Month'] = df['Upload Date'].dt.month_name()
df['DocID'] = np.random.randint(100000,200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']

# plotting graph
sns.set_color_codes("pastel")
f, ax = plt.subplots(figsize=(20,8))
sns.countplot(x='Month', data=df)

A new column with year and month in numeric form can serve to indicate the x-positions, correctly ordered.带有数字形式的年和月的新列可以用来指示正确排序的 x 位置。 The x-tick labels can be renamed to the month names. x-tick 标签可以重命名为月份名称。 Vertical lines and manual placing of the year labels lead to the final plot:垂直线和手动放置年份标签导致最终的 plot:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

test_date1, test_date2 = '20200101', '20210630'

months = pd.date_range('2021-01-01', periods=12, freq='M').strftime('%B')
K = 2000
df = pd.DataFrame(np.random.choice(pd.date_range(test_date1, test_date2), K), columns=['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
# df['Month'] = pd.Categorical(df['Upload Date'].dt.strftime('%B'), categories=months)
df['YearMonth'] = df['Upload Date'].dt.strftime('%Y%m').astype(int)
df['DocID'] = np.random.randint(100000, 200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']

sns.set_style("white")
sns.set_color_codes("pastel")
fig, ax = plt.subplots(figsize=(20, 8))
sns.countplot(x='YearMonth', data=df, ax=ax)
sns.despine()
yearmonth_labels = [int(l.get_text()) for l in ax.get_xticklabels()]
ax.set_xticklabels([months[ym % 100 - 1] for ym in yearmonth_labels])
ax.set_xlabel('')

# calculate the positions of the borders between the years
pos = []
years = []
prev = None
for i, ym in enumerate(yearmonth_labels):
    if ym // 100 != prev:
        pos.append(i)
        prev = ym // 100
        years.append(prev)
pos.append(len(yearmonth_labels))
pos = np.array(pos) - 0.5
# vertical lines to separate the years
ax.vlines(pos, 0, -0.12, color='black', lw=0.8, clip_on=False, transform=ax.get_xaxis_transform())
# years at the center of their range
for year, pos0, pos1 in zip(years, pos[:-1], pos[1:]):
    ax.text((pos0 + pos1) / 2, -0.07, year, ha='center', clip_on=False, transform=ax.get_xaxis_transform())

ax.set_xlim(pos[0], pos[-1])
ax.set_ylim(ymin=0)
plt.tight_layout()
plt.show()

sns.countplot 具有两级 x 轴

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM