如何返回 pandas dataframe 与滚动 window 但没有额外的 ZC1C425268E68385D1AB5074 应用到它？

Question

我有一个 dataframe 并且我想忽略（替换为 NaN）在滚动 window 中没有足够非 NaN 值的值。 示例 dataframe 可以通过以下方式重新创建：

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
for col in df.columns:
    df.loc[df.sample(frac=0.25).index, col] = np.nan  

       A     B     C     D
0   38.0  39.0   NaN  82.0
1   44.0  47.0   NaN   NaN
2    NaN  24.0  67.0   NaN
3   96.0   NaN   NaN  68.0
4   53.0   NaN  27.0  93.0

我想创建一个滚动的 window，宽度为 4，对于每个 window，我只想在至少有min_periods非 NaN 值的情况下保留该值。

我认为这将是微不足道的，只需使用：

df.rolling(4, min_periods=2).apply(lambda x: x)

但是，似乎apply不允许这样的 lambda 函数和pandas.core.base.DataError: No numeric types to aggregate返回错误。

Answer 1

您可以遍历 windows 并仅保留具有一定数量的 nan 值（或相反）的那些。

windowed_ds = df.rolling(4,min_periods=2)
windows_2_keep = []
for w in windowed_ds:
    # total nan values in window
    total_is_na_in_window = w.isna().sum().sum()
    # keep only windows with more than 2 nan values
    if total_is_na_in_window >2:
        windows_2_keep.append(w)
    # we can also do operations like mean or sum on each window
    # window_mean = w.mean().mean()

Answer 2

另一种解决方案是将自定义 function 应用于 window 以查找整个 window 的 nan 值，并根据条件进行任何可能的聚合。 这比 for 循环快得多。

windowed_ds = df.rolling(4,min_periods=2)

def agg_function(ser):
    nan_counts = df.loc[ser.index].isna().sum().sum()
    print('window',df.loc[ser.index])
    
    print(nan_counts)
    # do mean only if the window has at least 2 nan values
    if nan_counts>2:
        print('window mean',df.loc[ser.index].mean().mean())
        print('--------------')
        return df.loc[ser.index].mean().mean()
    else:
        print('window mean',0)
        print('--------------')
        return 0
# returns a series (or a df based on the agg function of the window) with the
# aggregation result of each window. The selected column "A" is random and it
# just indicates how many times to run the function (agg_function) in the apply method

result = windowed_ds.A.apply(agg_function, raw=False)

如何返回 pandas dataframe 与滚动 window 但没有额外的 ZC1C425268E68385D1AB5074 应用到它？

问题描述

2 个解决方案

解决方案1
0 2021-03-09 14:14:22

解决方案2
0 2021-03-10 15:31:49

如何返回 pandas dataframe 与滚动 window 但没有额外的 ZC1C425268E68385D1AB5074 应用到它？

问题描述

2 个解决方案

解决方案1 0 2021-03-09 14:14:22

解决方案2 0 2021-03-10 15:31:49

解决方案1
0 2021-03-09 14:14:22

解决方案2
0 2021-03-10 15:31:49