[英]How to get an histogram with range of dates with equal interval in X-axis?
I have a dataframe which is something like this -我有一个像这样的 dataframe -
col_1 | col_2 | col_3 | col_4
-----------------------------
A | 11 | 4/12/2017 | "aaa"
B | 22 | 4/04/2003 | "bbb"
C | 98 | 4/11/1905 | "ccc"
.... ... ... ... ... ... ..
.... ... ... ... ... ... ..
.... ... ... ... ... ... ..
Y | 101 | 8/12/1950 | "ddd"
I am trying to draw an histogram plot with range of years in the X axis and frequency in the Y axis.我正在尝试绘制一个直方图 plot,X 轴为年份,Y 轴为频率。
For example -例如 -
If I pass, year = 5, as the argument to my function(which would draw the plot), it should create the histogram with frequency of values between [starting_date(of col_3),starting_date + 5years] as the first bar, then the last date + 5 years, and so on till the last date is reached.如果我通过,year = 5,作为我的函数的参数(它将绘制绘图),它应该创建直方图,其中值的频率在 [starting_date(of col_3),starting_date + 5years] 作为第一个条形,然后最后一个日期 + 5 年,依此类推,直到达到最后一个日期。
Each bar should have the values falling in that range of dates.每个条的值都应落在该日期范围内。
My approach -我的方法——
I have tried to use pd.interval_range
+ pd.cut
but it didn't seem to work for me.我曾尝试使用
pd.interval_range
+ pd.cut
但它似乎对我不起作用。
interval = pd.interval_range(start=df["Resignation Date"].min(),end=df["Resignation Date"].max(),freq='5Y')
pd.cut(df['Resignation Date'], bins=interval) <-- This doesn't create intervals of 5 years range
When I try to plot the above, it says, TypeError: no numeric data to plot
当我尝试上面的 plot 时,它说,
TypeError: no numeric data to plot
Any help appreciated.任何帮助表示赞赏。
One approach is to convert the datetime objects in the "Resignation Date" column to numbers, using matplotlib's date_2_num method.一种方法是使用 matplotlib 的 date_2_num 方法将“辞职日期”列中的日期时间对象转换为数字。 The number of bins can then be calculated by (max-min)/(365*years_interval) as shown in the function below:
然后可以通过 (max-min)/(365*years_interval) 计算 bin 的数量,如下面的 function 所示:
def plot_histogram(df_date_column, years_interval):
dates_as_numbers = date2num(df_date_column)
days_interval = years_interval * 365
num_bins = round((dates_as_numbers.max() - dates_as_numbers.min()) / days_interval)
plt.hist(df_date_column, bins=num_bins, ec = 'k')
plt.gcf().autofmt_xdate()
plt.xlabel('Year')
plt.ylabel('Count')
plt.show()
As an example, for the following dataframe, the function produces the following plot:例如,对于以下 dataframe,function 产生以下 plot:
Country Date Profit
0 South Africa 2012-07-28 3839.13
1 Morocco 2013-10-19 338631.84
2 Papua New Guinea 2015-06-04 20592.00
3 Djibouti 2017-07-02 41273.28
4 Slovakia 2016-12-04 62217.18
.. ... ... ...
95 Liberia 2015-06-12 126918.64
96 Turkmenistan 2017-05-14 297783.20
97 Malawi 2016-03-12 291376.80
98 Vanuatu 2014-08-05 503279.79
99 Mali 2015-12-07 353819.26
plot_histogram(df['Date'], years_interval=2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.