[英]Create facet plot of pandas dataframe with month of year on x-axis
I have a data series consisting of monthly sales for individual fiscal years.我有一个数据系列,由各个会计年度的月销售额组成。 I am using a
pandas
dataframe to store the data.我正在使用
pandas
dataframe 来存储数据。 Each fiscal year starts on the first day of March and ends on the last day of the February in the following year.每个会计年度从三月的第一天开始,到次年二月的最后一天结束。 I am using a
plotly
facet plot to show the months of the year vertically aligned, so that March 2021 is below March 2020, and so on.我使用
plotly
刻面 plot 来显示一年中的月份垂直对齐,因此 2021 年 3 月低于 2020 年 3 月,依此类推。
Despite using a categorical variable for the x-axis the ordering is slightly off.尽管对 x 轴使用分类变量,但排序略有偏差。 I have tried sorting using a 'yearmon' variable with unique values, but that doesn't work either.
我尝试使用具有唯一值的“yearmon”变量进行排序,但这也不起作用。 Specifically, in the plot below the values for Jan and Feb in 2018 are blank, and Jan and Feb 2021 are also out of place.
具体来说,在 plot 中,2018 年 1 月和 2 月的值以下为空白,2021 年 1 月和 2 月的值也不合适。 How can I get the facet to show contiguous data without these problems?
我怎样才能在没有这些问题的情况下获得显示连续数据的方面? Edit: I have a feeling it is related to the ordering of the categories, but haven't managed to pin it down yet.
编辑:我觉得它与类别的顺序有关,但还没有设法确定它。
import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py
rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']
month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")
fig = px.bar(df, x = 'month', y = 'A', facet_col='year', facet_col_wrap=1)
py.image.save_as(fig, 'plotly.png', width=1000, height=500)
UPDATE更新
Using @vestland's code below as a base, I have tweaked the start date and the fiscal year assignment as per my comment below, because fiscal years are often not aligned with the calendar year.使用下面@vestland 的代码作为基础,我根据下面的评论调整了开始日期和财政年度分配,因为财政年度通常与日历年不一致。 Also, the length of the data series is arbitrary - it might be a few months, it might be a decade - and so are the start and end months.
此外,数据系列的长度是任意的——可能是几个月,也可能是十年——开始和结束月份也是如此。 Finally, I would like the x-axis to begin and end with the first and last months of the fiscal year, so in this case (March and February) 'Mar' should be the first tick mark on the left, and 'Feb' the last one on the right.
最后,我希望 x 轴以财政年度的第一个月和最后几个月开始和结束,所以在这种情况下(三月和二月)“三月”应该是左边的第一个刻度线,“二月”右边的最后一个。 My apologies if this was not sufficiently clear.
如果这不够清楚,我深表歉意。
import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py
rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']
month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")
df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()
If I understand correctly, then you seem to be doing everything right besides one minor detail.如果我理解正确,那么除了一个小细节之外,您似乎做的一切都是正确的。 Which is a bit surprising, so there's a fair chance I've misunderstand the premise of your question.
这有点令人惊讶,所以我很可能误解了你的问题的前提。 Anyway...
反正...
Specifically, in the plot below the values for Jan and Feb in 2018 are blank
具体来说,在 plot 中,2018 年 1 月和 2 月的值以下为空白
That's because no such dates exist in df.head()
那是因为
df.head()
中不存在这样的日期
A year month monthindex yearmon
2018-03-31 93 2018 Mar 03 201803
2018-04-30 84 2018 Apr 04 201804
2018-05-31 95 2018 May 05 201805
2018-06-30 86 2018 Jun 06 201806
2018-07-31 84 2018 Jul 07 201807
And if I understand your intentions correctly, You would in fact like to associate january and february of 2019
with the first x-axis.如果我正确理解您的意图,您实际上希望将
january and february of 2019
与第一个 x 轴相关联。 And despite your thorough effort, no such association has been made.尽管您付出了极大的努力,但还没有建立这样的关联。 And I'm not quite sure how you would do that, but if you make sure to set up something like this:
我不太确定你会怎么做,但如果你确保设置这样的东西:
df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12
And get:并得到:
Then you can run然后你可以运行
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year',facet_col_wrap=1)
And get:并得到:
As you can see, January and february of 2019
now appears on the x-axis of 2018. And so on for the rest of the years.如您所见,
January and february of 2019
现在出现在 2018 年的 x 轴上。对于这些年的 rest,依此类推。 I hope this is what you were looking for.我希望这就是你要找的。 Don't hesitate to let me know if not.
如果没有,请随时告诉我。
import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py
rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']
month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")
df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()
The issue in this case appears to be that plotly does not respect the order of the categories in the pandas data series used for the x-axis unless specifically instructed to do so, as pointed out in the plotly forum here , and documented here .在这种情况下,问题似乎是 plotly 不遵守用于 x 轴的 pandas 数据系列中的类别顺序,除非特别指示这样做,如 plotly 论坛中所指出的, 此处记录。 Using
category_orders
in the px.bar
call allows us to override the default plotly assumption and create an x-axis that runs from the first month of the fiscal year specified to the last month of the fiscal year.在
px.bar
调用中使用category_orders
允许我们覆盖默认的 plotly 假设并创建一个从指定财政年度的第一个月到财政年度最后一个月的 x 轴。
import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py
rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']
month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")
df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10
fig = px.bar(df, x = 'month', y = 'A',
facet_col='fiscal_year',
facet_col_wrap=1,
category_orders={ # replaces default order by column name
"month": ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
})
fig.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.