简体   繁体   English

在 x 轴上创建带有年份月份的 pandas dataframe 的方面 plot

[英]Create facet plot of pandas dataframe with month of year on x-axis

I have a data series consisting of monthly sales for individual fiscal years.我有一个数据系列,由各个会计年度的月销售额组成。 I am using a pandas dataframe to store the data.我正在使用pandas dataframe 来存储数据。 Each fiscal year starts on the first day of March and ends on the last day of the February in the following year.每个会计年度从三月的第一天开始,到次年二月的最后一天结束。 I am using a plotly facet plot to show the months of the year vertically aligned, so that March 2021 is below March 2020, and so on.我使用plotly刻面 plot 来显示一年中的月份垂直对齐,因此 2021 年 3 月低于 2020 年 3 月,依此类推。

Despite using a categorical variable for the x-axis the ordering is slightly off.尽管对 x 轴使用分类变量,但排序略有偏差。 I have tried sorting using a 'yearmon' variable with unique values, but that doesn't work either.我尝试使用具有唯一值的“yearmon”变量进行排序,但这也不起作用。 Specifically, in the plot below the values for Jan and Feb in 2018 are blank, and Jan and Feb 2021 are also out of place.具体来说,在 plot 中,2018 年 1 月和 2 月的值以下为空白,2021 年 1 月和 2 月的值也不合适。 How can I get the facet to show contiguous data without these problems?我怎样才能在没有这些问题的情况下获得显示连续数据的方面? Edit: I have a feeling it is related to the ordering of the categories, but haven't managed to pin it down yet.编辑:我觉得它与类别的顺序有关,但还没有设法确定它。

使用 plotly 和 pandas 数据框的分面图

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

fig = px.bar(df, x = 'month', y = 'A', facet_col='year', facet_col_wrap=1)
py.image.save_as(fig, 'plotly.png', width=1000, height=500)

UPDATE更新

Using @vestland's code below as a base, I have tweaked the start date and the fiscal year assignment as per my comment below, because fiscal years are often not aligned with the calendar year.使用下面@vestland 的代码作为基础,我根据下面的评论调整了开始日期和财政年度分配,因为财政年度通常与日历年不一致。 Also, the length of the data series is arbitrary - it might be a few months, it might be a decade - and so are the start and end months.此外,数据系列的长度是任意的——可能是几个月,也可能是十年——开始和结束月份也是如此。 Finally, I would like the x-axis to begin and end with the first and last months of the fiscal year, so in this case (March and February) 'Mar' should be the first tick mark on the left, and 'Feb' the last one on the right.最后,我希望 x 轴以财政年度的第一个月和最后几个月开始和结束,所以在这种情况下(三月和二月)“三月”应该是左边的第一个刻度线,“二月”右边的最后一个。 My apologies if this was not sufficiently clear.如果这不够清楚,我深表歉意。

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

This seems to give the following:这似乎给出了以下信息: 使用非日历会计年度绘图

If I understand correctly, then you seem to be doing everything right besides one minor detail.如果我理解正确,那么除了一个小细节之外,您似乎做的一切都是正确的。 Which is a bit surprising, so there's a fair chance I've misunderstand the premise of your question.这有点令人惊讶,所以我很可能误解了你的问题的前提。 Anyway...反正...

Specifically, in the plot below the values for Jan and Feb in 2018 are blank具体来说,在 plot 中,2018 年 1 月和 2 月的值以下为空白

That's because no such dates exist in df.head()那是因为df.head()中不存在这样的日期

             A  year month monthindex yearmon
2018-03-31  93  2018   Mar         03  201803
2018-04-30  84  2018   Apr         04  201804
2018-05-31  95  2018   May         05  201805
2018-06-30  86  2018   Jun         06  201806
2018-07-31  84  2018   Jul         07  201807

And if I understand your intentions correctly, You would in fact like to associate january and february of 2019 with the first x-axis.如果我正确理解您的意图,您实际上希望将january and february of 2019与第一个 x 轴相关联。 And despite your thorough effort, no such association has been made.尽管您付出了极大的努力,但还没有建立这样的关联。 And I'm not quite sure how you would do that, but if you make sure to set up something like this:我不太确定你会怎么做,但如果你确保设置这样的东西:

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12

And get:并得到:

在此处输入图像描述

Then you can run然后你可以运行

fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year',facet_col_wrap=1)

And get:并得到:

在此处输入图像描述

As you can see, January and february of 2019 now appears on the x-axis of 2018. And so on for the rest of the years.如您所见, January and february of 2019现在出现在 2018 年的 x 轴上。对于这些年的 rest,依此类推。 I hope this is what you were looking for.我希望这就是你要找的。 Don't hesitate to let me know if not.如果没有,请随时告诉我。

Complete code:完整代码:

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

The issue in this case appears to be that plotly does not respect the order of the categories in the pandas data series used for the x-axis unless specifically instructed to do so, as pointed out in the plotly forum here , and documented here .在这种情况下,问题似乎是 plotly 不遵守用于 x 轴的 pandas 数据系列中的类别顺序,除非特别指示这样做,如 plotly 论坛中所指出的 此处记录。 Using category_orders in the px.bar call allows us to override the default plotly assumption and create an x-axis that runs from the first month of the fiscal year specified to the last month of the fiscal year.px.bar调用中使用category_orders允许我们覆盖默认的 plotly 假设并创建一个从指定财政年度的第一个月到财政年度最后一个月的 x 轴。

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10

fig = px.bar(df, x = 'month', y = 'A', 
              facet_col='fiscal_year',
              facet_col_wrap=1,
              category_orders={ # replaces default order by column name
                "month": ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
            })       
fig.show() 

使用有序类别的 pandas 数据框的多面图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM