简体   繁体   English

如何在 X 轴上获得具有相等间隔的日期范围的直方图?

[英]How to get an histogram with range of dates with equal interval in X-axis?

I have a dataframe which is something like this -我有一个像这样的 dataframe -

col_1 | col_2 | col_3 | col_4
-----------------------------
  A   |  11   | 4/12/2017 | "aaa"
  B   |  22   | 4/04/2003 | "bbb"
  C   |  98   | 4/11/1905 | "ccc"
....  ...  ... ... ... ... ..
....  ...  ... ... ... ... ..
....  ...  ... ... ... ... ..
  Y   |  101  | 8/12/1950 | "ddd"

I am trying to draw an histogram plot with range of years in the X axis and frequency in the Y axis.我正在尝试绘制一个直方图 plot,X 轴为年份,Y 轴为频率。

For example -例如 -

If I pass, year = 5, as the argument to my function(which would draw the plot), it should create the histogram with frequency of values between [starting_date(of col_3),starting_date + 5years] as the first bar, then the last date + 5 years, and so on till the last date is reached.如果我通过,year = 5,作为我的函数的参数(它将绘制绘图),它应该创建直方图,其中值的频率在 [starting_date(of col_3),starting_date + 5years] 作为第一个条形,然后最后一个日期 + 5 年,依此类推,直到达到最后一个日期。

Each bar should have the values falling in that range of dates.每个条的值都应落在该日期范围内。

My approach -我的方法——

I have tried to use pd.interval_range + pd.cut but it didn't seem to work for me.我曾尝试使用pd.interval_range + pd.cut但它似乎对我不起作用。

interval = pd.interval_range(start=df["Resignation Date"].min(),end=df["Resignation Date"].max(),freq='5Y')
pd.cut(df['Resignation Date'], bins=interval) <-- This doesn't create intervals of 5 years range

When I try to plot the above, it says, TypeError: no numeric data to plot当我尝试上面的 plot 时,它说, TypeError: no numeric data to plot

Any help appreciated.任何帮助表示赞赏。

One approach is to convert the datetime objects in the "Resignation Date" column to numbers, using matplotlib's date_2_num method.一种方法是使用 matplotlib 的 date_2_num 方法将“辞职日期”列中的日期时间对象转换为数字。 The number of bins can then be calculated by (max-min)/(365*years_interval) as shown in the function below:然后可以通过 (max-min)/(365*years_interval) 计算 bin 的数量,如下面的 function 所示:

def plot_histogram(df_date_column, years_interval):
    dates_as_numbers = date2num(df_date_column)
    days_interval = years_interval * 365
    num_bins = round((dates_as_numbers.max() - dates_as_numbers.min()) / days_interval)

    plt.hist(df_date_column, bins=num_bins, ec = 'k')
    plt.gcf().autofmt_xdate()
    plt.xlabel('Year')
    plt.ylabel('Count')
    plt.show()

As an example, for the following dataframe, the function produces the following plot:例如,对于以下 dataframe,function 产生以下 plot:

             Country       Date     Profit
0       South Africa 2012-07-28    3839.13
1            Morocco 2013-10-19  338631.84
2   Papua New Guinea 2015-06-04   20592.00
3           Djibouti 2017-07-02   41273.28
4           Slovakia 2016-12-04   62217.18
..               ...        ...        ...
95           Liberia 2015-06-12  126918.64
96      Turkmenistan 2017-05-14  297783.20
97            Malawi 2016-03-12  291376.80
98           Vanuatu 2014-08-05  503279.79
99              Mali 2015-12-07  353819.26

plot_histogram(df['Date'], years_interval=2)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM