简体   繁体   English

熊猫:按年份分组并按地块密度

[英]Pandas: Group by year and plot density

I have a data frame that contains some time based data: 我有一个包含一些基于time的数据的数据框:

>>> temp.groupby(pd.TimeGrouper('AS'))['INC_RANK'].mean()
date
2001-01-01    0.567128
2002-01-01    0.581349
2003-01-01    0.556646
2004-01-01    0.549128
2005-01-01         NaN
2006-01-01    0.536796
2007-01-01    0.513109
2008-01-01    0.525859
2009-01-01    0.530433
2010-01-01    0.499250
2011-01-01    0.488159
2012-01-01    0.493405
2013-01-01    0.530207
Freq: AS-JAN, Name: INC_RANK, dtype: float64

And now I would like to plot the density for each year. 现在我想绘制每年的密度。 The following command used to work for other data frames, but it is not here: 以下命令用于其他数据帧,但此处不存在:

>>> temp.groupby(pd.TimeGrouper('AS'))['INC_RANK'].plot(kind='density')
ValueError: ordinal must be >= 1

Here's how that column looks like: 该列的外观如下:

>>> temp['INC_RANK'].head()
date
2001-01-01    0.516016
2001-01-01    0.636038
2001-01-01    0.959501
2001-01-01         NaN
2001-01-01    0.433824
Name: INC_RANK, dtype: float64

I think it is due to the nan in your data, as density can not be estimated for nan s. 我认为这是由于您数据中的nan所致,因为无法估算nan s的密度。 However, since you want to visualize density, it should not be a big issue to simply just drop the missing values, assuming the missing/unobserved cells should follow the same distribution as the observed/non-missing cells. 但是,由于要可视化密度,假设丢失/未观察到的像元应遵循与观察到的/未缺失的像元相同的分布,只是简单地降低缺失值就不是一个大问题。 Therefore, df.dropna().groupby(pd.TimeGrouper('AS'))['INC_RANK'].plot(kind='density') should suffice. 因此, df.dropna().groupby(pd.TimeGrouper('AS'))['INC_RANK'].plot(kind='density')就足够了。

On the other hand, if the missing values are not 'unobserved', but rather are the values out of the measuring range (say data from a temperature sensor, which reads 0~50F, but sometimes, 100F temperate is encountered. Sensor sends out a error code and recorded as missing value), then dropna() probably is not a good idea. 另一方面,如果不是“未观察到”缺失值,而是超出测量范围的值(例如,来自温度传感器的数据,其读数为0〜50F,但有时会遇到100F温带。传感器发出错误代码并记录为缺失值),那么dropna()可能不是一个好主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM