简体   繁体   English

Pandas 重新采样以在所有值为 NaN 时返回 NaN

[英]Pandas resample to return NaN when all values are NaN

I'm using resample to sum my data into hourly blocks.我正在使用重新采样将我的数据汇总到每小时块中。 When all input data for the hour is NaN, resample is producing a value of 0 instead of NaN.当该小时的所有输入数据均为 NaN 时,resample 将生成 0 而不是 NaN 的值。

My raw data is this:我的原始数据是这样的:

infile
Out[206]:
             Date_time  Rainfall
0  2019-02-02 14:18:00       NaN
1  2019-02-02 14:20:00       NaN
2  2019-02-02 14:25:00       NaN
3  2019-02-02 14:30:00       NaN
4  2019-02-02 14:35:00       NaN
5  2019-02-02 14:40:00       NaN
6  2019-02-02 14:45:00       NaN
7  2019-02-02 14:50:00       NaN
8  2019-02-02 14:55:00       NaN
9  2019-02-02 15:00:00       0.0
10 2019-02-02 15:05:00       NaN
11 2019-02-02 15:10:00       NaN
12 2019-02-02 15:15:00       NaN
13 2019-02-02 15:20:00       NaN
14 2019-02-02 15:25:00       NaN
15 2019-02-02 15:30:00       NaN
16 2019-02-02 15:35:00       NaN
17 2019-02-02 15:40:00       NaN
18 2019-02-02 15:45:00       NaN
19 2019-02-02 15:50:00       NaN
20 2019-02-02 15:55:00       NaN

I want my output to be this:我希望我的 output 是这样的:

             Date_time  Rainfall  
0  2019-02-02 14:18:00       NaN
1  2019-02-02 15:00:00       0.0

But instead I'm getting this:但相反,我得到了这个:

output[['Date_time', 'Rainfall']]
Out[208]: 
                Date_time  Rainfall
0     2019-02-02 14:18:00       0.0
1     2019-02-02 15:00:00       0.0

This is the code that I'm using to get there - it's a little more complicated than it needs to be for this example because I use it to iterate through a list of column names at other points:这是我用来到达那里的代码 - 它比本示例所需的要复杂一些,因为我使用它来遍历其他点的列名列表:

def sum_calc(col_name):
    col =  infile[['Date_time', col_name]].copy()
    col.columns = ('A', 'B')
    col = col.resample('H', on='A').B.sum().reset_index(drop=True)
    output[col_name] = col.copy()

sum_calc('Rainfall')

Any clues on how to get this to work?关于如何使它工作的任何线索? I've had a look online and all the options seem to produce NaN if any value in group is NaN, rather than all values like I'm after.我在网上看了看,如果组中的任何值是 NaN,所有选项似乎都会产生 NaN,而不是像我追求的所有值。

Try:尝试:

>>> df.resample("H", on="Date_time")["Rainfall"].agg(pd.Series.sum, min_count=1)
Date_time
2021-12-17 14:00:00    NaN
2021-12-17 15:00:00    0.0
Freq: H, Name: Rainfall, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM