簡體   English   中英

找出列中 nan 的順序有多長

[英]Find out how long the order of nans in a column is

我需要通過這個特定規則(有效地)清理數據:

如果一列中有 3 個或更少的連續NaN ,則通過.fillna(method='ffill') 在 df 列中填充此NaN “鏈”。 否則離開它(另一種方法)

例子:

df = pd.DataFrame({"A":[8001, 7999, 7998, np.NaN, 9900, 9342, 9324, 8534, 8358, 9457, np.nan, 8999, 8492, np.nan, np.nan],
                   "B":[201, 209, 298, 300,np.nan, 342, 324, 854, 858, 457, 145, 189, 192, 134, 135],
                   "C":[11991, 15631, 47998, 38030, 19900, 29342, np.nan, np.nan, np.nan,np.nan, 27245, 28999, 28492, 29334, 28234]}, 
                   index=pd.Index(['2019-06-17 00:00:00','2019-06-17 00:01:01', '2019-06-17 00:02:00', '2019-06-17 00:03:04', 
                                   '2020-06-17 00:04:00', '2020-06-17 00:05:00', '2020-06-17 00:06:00', '2020-06-17 00:07:00',
                                   '2020-06-17 00:08:00','2020-06-17 00:09:00','2020-06-17 00:10:00','2020-06-17 00:11:00',
                                   '2020-06-17 00:12:00','2020-06-17 00:13:00', '2020-06-17 00:14:00']))

df

                 Time     A     B       C
'2019-06-17 00:00:00'  8001   201   11991
'2019-06-17 00:01:01'  7999   209   15631
'2019-06-17 00:02:00'  7998   298   47998
'2019-06-17 00:03:04'  NaN    300   38030
'2020-06-17 00:04:00'  9900   NaN   19900
'2020-06-17 00:05:00'  9342   342   29342
'2020-06-17 00:06:00'  9324   324     NaN
'2020-06-17 00:07:00'  8534   854     NaN
'2020-06-17 00:08:00'  8358   858     NaN
'2020-06-17 00:09:00'  9457   457     NaN
'2020-06-17 00:10:00'   NaN   145   27245
'2020-06-17 00:11:00'  8999   189   28999
'2020-06-17 00:12:00'  8492   192   28492
'2020-06-17 00:13:00'   NaN   134   29334
'2020-06-17 00:14:00'   NaN   135   28234

預期結果:

                 Time     A     B       C
'2019-06-17 00:00:00'  8001   201   11991
'2019-06-17 00:01:01'  7999   209   15631
'2019-06-17 00:02:00'  7998   298   47998
'2019-06-17 00:03:04'  7998   300   38030
'2020-06-17 00:04:00'  9900   300   19900
'2020-06-17 00:05:00'  9342   342   29342
'2020-06-17 00:06:00'  9324   324     NaN
'2020-06-17 00:07:00'  8534   854     NaN
'2020-06-17 00:08:00'  8358   858     NaN
'2020-06-17 00:09:00'  9457   457     NaN
'2020-06-17 00:10:00'  9457   145   27245
'2020-06-17 00:11:00'  8999   189   28999
'2020-06-17 00:12:00'  8492   192   28492
'2020-06-17 00:13:00'  8492   134   29334
'2020-06-17 00:14:00'  8492   135   28234

僅確定連續 NaN 組的大小,並找出哪些小於您的最大間隙大小。 然后通過使用 Boolean 系列來掩蓋整個前向填充柱,您可以有效地僅填充小於或等於指定間隙大小的間隙。

def fwd_fill_gaps(df, col, gap_max):
    """ Fill conseuctive NaN when size is <= gap_max """

    s = df[col].notnull().cumsum().where(df[col].isnull())
    # Only True for NaN gaps of size <= gap_max
    s = s.groupby(s).transform('size').le(gap_max)

    return df[col].fillna(df[col].ffill().where(s), downcast='infer')


for col in ['A', 'B', 'C']:
    df[col] = fwd_fill_gaps(df, col, gap_max=3)

                        A    B        C
2019-06-17 00:00:00  8001  201  11991.0
2019-06-17 00:01:01  7999  209  15631.0
2019-06-17 00:02:00  7998  298  47998.0
2019-06-17 00:03:04  7998  300  38030.0
2020-06-17 00:04:00  9900  300  19900.0
2020-06-17 00:05:00  9342  342  29342.0
2020-06-17 00:06:00  9324  324      NaN
2020-06-17 00:07:00  8534  854      NaN
2020-06-17 00:08:00  8358  858      NaN
2020-06-17 00:09:00  9457  457      NaN
2020-06-17 00:10:00  9457  145  27245.0
2020-06-17 00:11:00  8999  189  28999.0
2020-06-17 00:12:00  8492  192  28492.0
2020-06-17 00:13:00  8492  134  29334.0
2020-06-17 00:14:00  8492  135  28234.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM