简体   繁体   中英

Create facet plot of pandas dataframe with month of year on x-axis

I have a data series consisting of monthly sales for individual fiscal years. I am using a pandas dataframe to store the data. Each fiscal year starts on the first day of March and ends on the last day of the February in the following year. I am using a plotly facet plot to show the months of the year vertically aligned, so that March 2021 is below March 2020, and so on.

Despite using a categorical variable for the x-axis the ordering is slightly off. I have tried sorting using a 'yearmon' variable with unique values, but that doesn't work either. Specifically, in the plot below the values for Jan and Feb in 2018 are blank, and Jan and Feb 2021 are also out of place. How can I get the facet to show contiguous data without these problems? Edit: I have a feeling it is related to the ordering of the categories, but haven't managed to pin it down yet.

使用 plotly 和 pandas 数据框的分面图

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

fig = px.bar(df, x = 'month', y = 'A', facet_col='year', facet_col_wrap=1)
py.image.save_as(fig, 'plotly.png', width=1000, height=500)

UPDATE

Using @vestland's code below as a base, I have tweaked the start date and the fiscal year assignment as per my comment below, because fiscal years are often not aligned with the calendar year. Also, the length of the data series is arbitrary - it might be a few months, it might be a decade - and so are the start and end months. Finally, I would like the x-axis to begin and end with the first and last months of the fiscal year, so in this case (March and February) 'Mar' should be the first tick mark on the left, and 'Feb' the last one on the right. My apologies if this was not sufficiently clear.

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

This seems to give the following: 使用非日历会计年度绘图

If I understand correctly, then you seem to be doing everything right besides one minor detail. Which is a bit surprising, so there's a fair chance I've misunderstand the premise of your question. Anyway...

Specifically, in the plot below the values for Jan and Feb in 2018 are blank

That's because no such dates exist in df.head()

             A  year month monthindex yearmon
2018-03-31  93  2018   Mar         03  201803
2018-04-30  84  2018   Apr         04  201804
2018-05-31  95  2018   May         05  201805
2018-06-30  86  2018   Jun         06  201806
2018-07-31  84  2018   Jul         07  201807

And if I understand your intentions correctly, You would in fact like to associate january and february of 2019 with the first x-axis. And despite your thorough effort, no such association has been made. And I'm not quite sure how you would do that, but if you make sure to set up something like this:

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12

And get:

在此处输入图像描述

Then you can run

fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year',facet_col_wrap=1)

And get:

在此处输入图像描述

As you can see, January and february of 2019 now appears on the x-axis of 2018. And so on for the rest of the years. I hope this is what you were looking for. Don't hesitate to let me know if not.

Complete code:

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

The issue in this case appears to be that plotly does not respect the order of the categories in the pandas data series used for the x-axis unless specifically instructed to do so, as pointed out in the plotly forum here , and documented here . Using category_orders in the px.bar call allows us to override the default plotly assumption and create an x-axis that runs from the first month of the fiscal year specified to the last month of the fiscal year.

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10

fig = px.bar(df, x = 'month', y = 'A', 
              facet_col='fiscal_year',
              facet_col_wrap=1,
              category_orders={ # replaces default order by column name
                "month": ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
            })       
fig.show() 

使用有序类别的 pandas 数据框的多面图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM