繁体   English   中英

如何返回 pandas dataframe 与滚动 window 但没有额外的 ZC1C425268E68385D1AB5074 应用到它?

[英]How to return a pandas dataframe with a rolling window but no additional function applied to it?

我有一个 dataframe 并且我想忽略(替换为 NaN)在滚动 window 中没有足够非 NaN 值的值。 示例 dataframe 可以通过以下方式重新创建:

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
for col in df.columns:
    df.loc[df.sample(frac=0.25).index, col] = np.nan  

       A     B     C     D
0   38.0  39.0   NaN  82.0
1   44.0  47.0   NaN   NaN
2    NaN  24.0  67.0   NaN
3   96.0   NaN   NaN  68.0
4   53.0   NaN  27.0  93.0

我想创建一个滚动的 window,宽度为 4,对于每个 window,我只想在至少有min_periods非 NaN 值的情况下保留该值。

我认为这将是微不足道的,只需使用:

df.rolling(4, min_periods=2).apply(lambda x: x)

但是,似乎apply不允许这样的 lambda 函数和pandas.core.base.DataError: No numeric types to aggregate返回错误。

您可以遍历 windows 并仅保留具有一定数量的 nan 值(或相反)的那些。

windowed_ds = df.rolling(4,min_periods=2)
windows_2_keep = []
for w in windowed_ds:
    # total nan values in window
    total_is_na_in_window = w.isna().sum().sum()
    # keep only windows with more than 2 nan values
    if total_is_na_in_window >2:
        windows_2_keep.append(w)
    # we can also do operations like mean or sum on each window
    # window_mean = w.mean().mean()

另一种解决方案是将自定义 function 应用于 window 以查找整个 window 的 nan 值,并根据条件进行任何可能的聚合。 这比 for 循环快得多。

windowed_ds = df.rolling(4,min_periods=2)

def agg_function(ser):
    nan_counts = df.loc[ser.index].isna().sum().sum()
    print('window',df.loc[ser.index])
    
    print(nan_counts)
    # do mean only if the window has at least 2 nan values
    if nan_counts>2:
        print('window mean',df.loc[ser.index].mean().mean())
        print('--------------')
        return df.loc[ser.index].mean().mean()
    else:
        print('window mean',0)
        print('--------------')
        return 0
# returns a series (or a df based on the agg function of the window) with the
# aggregation result of each window. The selected column "A" is random and it
# just indicates how many times to run the function (agg_function) in the apply method

result = windowed_ds.A.apply(agg_function, raw=False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM