簡體   English   中英

將多級 X 軸添加到 matplotlib/Seaborn(月和年)

[英]Adding multi level X axis to matplotlib/Seaborn (month and year)

我很難在 plot 中添加帶有月份和年份的多級軸,並且我無法在任何地方找到任何答案。 我有一個 dataframe ,其中包含作為 datetime dtype 的上傳日期,然后是每一行的年月。 見下文:

    Upload Date Year    Month      DocID
0   2021-03-22  2021    March      DOC146984
1   2021-12-16  2021    December   DOC173111
2   2021-12-07  2021    December   DOC115350
3   2021-10-29  2021    October    DOC150149
4   2021-03-12  2021    March      DOC125480
5   2021-06-25  2021    June       DOC101062
6   2021-05-03  2021    May        DOC155916
7   2021-11-14  2021    November   DOC198519
8   2021-03-20  2021    March      DOC159523
9   2021-07-19  2021    July       DOC169328
10  2021-04-13  2021    April      DOC182660
11  2021-10-08  2021    October    DOC176871
12  2021-09-19  2021    September  DOC185854
13  2021-05-16  2021    May        DOC192329
14  2021-06-29  2021    June       DOC142190
15  2021-11-30  2021    November   DOC140231
16  2021-11-12  2021    November   DOC145392
17  2021-11-10  2021    November   DOC178159
18  2021-11-06  2021    November   DOC160932
19  2021-06-16  2021    June       DOC131448

我想要實現的是構建一個條形圖,其中包含每個月和每年的文檔數量。 該圖看起來像這樣:


主要的是 x 軸按月拆分,然后按年拆分,而不是我用月份和年份標記每一列(例如“2021 年 3 月”)。 但是我不知道如何實現這一點。 我試過使用計數圖,但它只允許我選擇月份或年份(見下文)。 我也嘗試過 groupby,但最終產品總是一樣的。 有任何想法嗎?



import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from datetime import date, timedelta
from random import choices
# initializing dates ranges 
test_date1, test_date2 = date(2020, 1, 1), date(2021, 6, 30)
# initializing K
K = 2000
res_dates = [test_date1]
# loop to get each date till end date
while test_date1 != test_date2:
    test_date1 += timedelta(days=1)
# random K dates from pack
res = choices(res_dates, k=K)

# Generating dataframe
df = pd.DataFrame(res, columns=['Upload Date'])

# Generate other columns
df['Upload Date'] = pd.to_datetime(df['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
df['Month'] = df['Upload Date'].dt.month_name()
df['DocID'] = np.random.randint(100000,200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']

# plotting graph
f, ax = plt.subplots(figsize=(20,8))
sns.countplot(x='Month', data=df)

帶有數字形式的年和月的新列可以用來指示正確排序的 x 位置。 x-tick 標簽可以重命名為月份名稱。 垂直線和手動放置年份標簽導致最終的 plot:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

test_date1, test_date2 = '20200101', '20210630'

months = pd.date_range('2021-01-01', periods=12, freq='M').strftime('%B')
K = 2000
df = pd.DataFrame(np.random.choice(pd.date_range(test_date1, test_date2), K), columns=['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
# df['Month'] = pd.Categorical(df['Upload Date'].dt.strftime('%B'), categories=months)
df['YearMonth'] = df['Upload Date'].dt.strftime('%Y%m').astype(int)
df['DocID'] = np.random.randint(100000, 200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']

fig, ax = plt.subplots(figsize=(20, 8))
sns.countplot(x='YearMonth', data=df, ax=ax)
yearmonth_labels = [int(l.get_text()) for l in ax.get_xticklabels()]
ax.set_xticklabels([months[ym % 100 - 1] for ym in yearmonth_labels])

# calculate the positions of the borders between the years
pos = []
years = []
prev = None
for i, ym in enumerate(yearmonth_labels):
    if ym // 100 != prev:
        prev = ym // 100
pos = np.array(pos) - 0.5
# vertical lines to separate the years
ax.vlines(pos, 0, -0.12, color='black', lw=0.8, clip_on=False, transform=ax.get_xaxis_transform())
# years at the center of their range
for year, pos0, pos1 in zip(years, pos[:-1], pos[1:]):
    ax.text((pos0 + pos1) / 2, -0.07, year, ha='center', clip_on=False, transform=ax.get_xaxis_transform())

ax.set_xlim(pos[0], pos[-1])

sns.countplot 具有兩級 x 軸


聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

粵ICP備18138465號  © 2020-2024 STACKOOM.COM