[英]Adding multi level X axis to matplotlib/Seaborn (month and year)
我很難在 plot 中添加帶有月份和年份的多級軸,並且我無法在任何地方找到任何答案。 我有一個 dataframe ,其中包含作為 datetime dtype 的上傳日期,然后是每一行的年月。 見下文:
Upload Date Year Month DocID
0 2021-03-22 2021 March DOC146984
1 2021-12-16 2021 December DOC173111
2 2021-12-07 2021 December DOC115350
3 2021-10-29 2021 October DOC150149
4 2021-03-12 2021 March DOC125480
5 2021-06-25 2021 June DOC101062
6 2021-05-03 2021 May DOC155916
7 2021-11-14 2021 November DOC198519
8 2021-03-20 2021 March DOC159523
9 2021-07-19 2021 July DOC169328
10 2021-04-13 2021 April DOC182660
11 2021-10-08 2021 October DOC176871
12 2021-09-19 2021 September DOC185854
13 2021-05-16 2021 May DOC192329
14 2021-06-29 2021 June DOC142190
15 2021-11-30 2021 November DOC140231
16 2021-11-12 2021 November DOC145392
17 2021-11-10 2021 November DOC178159
18 2021-11-06 2021 November DOC160932
19 2021-06-16 2021 June DOC131448
我想要實現的是構建一個條形圖,其中包含每個月和每年的文檔數量。 該圖看起來像這樣:
主要的是 x 軸按月拆分,然后按年拆分,而不是我用月份和年份標記每一列(例如“2021 年 3 月”)。 但是我不知道如何實現這一點。 我試過使用計數圖,但它只允許我選擇月份或年份(見下文)。 我也嘗試過 groupby,但最終產品總是一樣的。 有任何想法嗎?
這是使用隨機生成的數據,請參閱下面要復制的代碼:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from datetime import date, timedelta
from random import choices
np.random.seed(42)
# initializing dates ranges
test_date1, test_date2 = date(2020, 1, 1), date(2021, 6, 30)
# initializing K
K = 2000
res_dates = [test_date1]
# loop to get each date till end date
while test_date1 != test_date2:
test_date1 += timedelta(days=1)
res_dates.append(test_date1)
# random K dates from pack
res = choices(res_dates, k=K)
# Generating dataframe
df = pd.DataFrame(res, columns=['Upload Date'])
# Generate other columns
df['Upload Date'] = pd.to_datetime(df['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
df['Month'] = df['Upload Date'].dt.month_name()
df['DocID'] = np.random.randint(100000,200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']
# plotting graph
sns.set_color_codes("pastel")
f, ax = plt.subplots(figsize=(20,8))
sns.countplot(x='Month', data=df)
帶有數字形式的年和月的新列可以用來指示正確排序的 x 位置。 x-tick 標簽可以重命名為月份名稱。 垂直線和手動放置年份標簽導致最終的 plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test_date1, test_date2 = '20200101', '20210630'
months = pd.date_range('2021-01-01', periods=12, freq='M').strftime('%B')
K = 2000
df = pd.DataFrame(np.random.choice(pd.date_range(test_date1, test_date2), K), columns=['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
# df['Month'] = pd.Categorical(df['Upload Date'].dt.strftime('%B'), categories=months)
df['YearMonth'] = df['Upload Date'].dt.strftime('%Y%m').astype(int)
df['DocID'] = np.random.randint(100000, 200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']
sns.set_style("white")
sns.set_color_codes("pastel")
fig, ax = plt.subplots(figsize=(20, 8))
sns.countplot(x='YearMonth', data=df, ax=ax)
sns.despine()
yearmonth_labels = [int(l.get_text()) for l in ax.get_xticklabels()]
ax.set_xticklabels([months[ym % 100 - 1] for ym in yearmonth_labels])
ax.set_xlabel('')
# calculate the positions of the borders between the years
pos = []
years = []
prev = None
for i, ym in enumerate(yearmonth_labels):
if ym // 100 != prev:
pos.append(i)
prev = ym // 100
years.append(prev)
pos.append(len(yearmonth_labels))
pos = np.array(pos) - 0.5
# vertical lines to separate the years
ax.vlines(pos, 0, -0.12, color='black', lw=0.8, clip_on=False, transform=ax.get_xaxis_transform())
# years at the center of their range
for year, pos0, pos1 in zip(years, pos[:-1], pos[1:]):
ax.text((pos0 + pos1) / 2, -0.07, year, ha='center', clip_on=False, transform=ax.get_xaxis_transform())
ax.set_xlim(pos[0], pos[-1])
ax.set_ylim(ymin=0)
plt.tight_layout()
plt.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.