简体   繁体   English

如何在不跳过 Pandas 中的 nan 值的情况下重新采样

[英]how to resample without skipping nan values in pandas

I am trying get the 10 days aggregate of my data which has NaN values.我正在尝试获取具有 NaN 值的数据的 10 天聚合。 The sum of 10 days should return a nan values if there is a NaN value in the 10 day duration.如果在 10 天持续时间内存在 NaN 值,则 10 天的总和应返回 nan 值。

When I apply the below code, pandas is considering NaN as Zero and returning the sum of remaining days.当我应用以下代码时,pandas 将 NaN 视为零并返回剩余天数的总和。

dateRange = pd.date_range(start_date, periods=len(data), freq='D')
# Creating a data frame so that the timeseries can handle numpy array.
df = pd.DataFrame(data)
base_Series = pd.DataFrame(list(df.values), index=dateRange)
# Converting to aggregated series
agg_series = base_Series.resample('10D', how='sum')
agg_data = agg_series.values 

Sample Data:样本数据:

2011-06-01  46.520536
2011-06-02   8.988311
2011-06-03   0.133823
2011-06-04   0.274521
2011-06-05   1.283360
2011-06-06   2.556313
2011-06-07   0.027461
2011-06-08   0.001584
2011-06-09   0.079193
2011-06-10   2.389549
2011-06-11        NaN
2011-06-12   0.195844
2011-06-13   0.058720
2011-06-14   6.570925
2011-06-15   0.015107
2011-06-16   0.031066
2011-06-17   0.073008
2011-06-18   0.072198
2011-06-19   0.044534
2011-06-20   0.240080

Output:输出:

2011-06-01  62.254651
2011-06-11   7.301481

This uses numpy sum which will return nan if nan is present in the sum 这将使用numpy sum,如果总和中包含nan,则将返回nan

In [35]: s = Series(randn(100),index=date_range('20130101',periods=100))

In [36]: s.iloc[11] = np.nan

In [37]: s.resample('10D',how=lambda x: x.values.sum())
Out[37]: 
2013-01-01    6.910729
2013-01-11         NaN
2013-01-21   -1.592541
2013-01-31   -2.013012
2013-02-10    1.129273
2013-02-20   -2.054807
2013-03-02    4.669622
2013-03-12    3.489225
2013-03-22    0.390786
2013-04-01   -0.005655
dtype: float64

to filter out those days which have any NaNs, I propose that you do 为了过滤掉那些含有NaN的日子,我建议您

noNaN_days_only = s.groupby(lambda x: x.date).filter(lambda x: ~x.isnull().any()

where s is a DataFrame 其中s是一个DataFrame

只需应用 agg 函数:

agg_series = base_Series.resample('10D').agg(lambda x: np.nan if np.isnan(x).all() else np.sum(x) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM