![](/img/trans.png)
[英]How do I find lowercase words in a DataFrame column that has NaNs?
[英]Find out how long the order of nans in a column is
我需要通過這個特定規則(有效地)清理數據:
如果一列中有 3 個或更少的連續NaN
,則通過.fillna(method='ffill') 在 df 列中填充此NaN
“鏈”。 否則離開它(另一種方法)
例子:
df = pd.DataFrame({"A":[8001, 7999, 7998, np.NaN, 9900, 9342, 9324, 8534, 8358, 9457, np.nan, 8999, 8492, np.nan, np.nan],
"B":[201, 209, 298, 300,np.nan, 342, 324, 854, 858, 457, 145, 189, 192, 134, 135],
"C":[11991, 15631, 47998, 38030, 19900, 29342, np.nan, np.nan, np.nan,np.nan, 27245, 28999, 28492, 29334, 28234]},
index=pd.Index(['2019-06-17 00:00:00','2019-06-17 00:01:01', '2019-06-17 00:02:00', '2019-06-17 00:03:04',
'2020-06-17 00:04:00', '2020-06-17 00:05:00', '2020-06-17 00:06:00', '2020-06-17 00:07:00',
'2020-06-17 00:08:00','2020-06-17 00:09:00','2020-06-17 00:10:00','2020-06-17 00:11:00',
'2020-06-17 00:12:00','2020-06-17 00:13:00', '2020-06-17 00:14:00']))
df
Time A B C
'2019-06-17 00:00:00' 8001 201 11991
'2019-06-17 00:01:01' 7999 209 15631
'2019-06-17 00:02:00' 7998 298 47998
'2019-06-17 00:03:04' NaN 300 38030
'2020-06-17 00:04:00' 9900 NaN 19900
'2020-06-17 00:05:00' 9342 342 29342
'2020-06-17 00:06:00' 9324 324 NaN
'2020-06-17 00:07:00' 8534 854 NaN
'2020-06-17 00:08:00' 8358 858 NaN
'2020-06-17 00:09:00' 9457 457 NaN
'2020-06-17 00:10:00' NaN 145 27245
'2020-06-17 00:11:00' 8999 189 28999
'2020-06-17 00:12:00' 8492 192 28492
'2020-06-17 00:13:00' NaN 134 29334
'2020-06-17 00:14:00' NaN 135 28234
預期結果:
Time A B C
'2019-06-17 00:00:00' 8001 201 11991
'2019-06-17 00:01:01' 7999 209 15631
'2019-06-17 00:02:00' 7998 298 47998
'2019-06-17 00:03:04' 7998 300 38030
'2020-06-17 00:04:00' 9900 300 19900
'2020-06-17 00:05:00' 9342 342 29342
'2020-06-17 00:06:00' 9324 324 NaN
'2020-06-17 00:07:00' 8534 854 NaN
'2020-06-17 00:08:00' 8358 858 NaN
'2020-06-17 00:09:00' 9457 457 NaN
'2020-06-17 00:10:00' 9457 145 27245
'2020-06-17 00:11:00' 8999 189 28999
'2020-06-17 00:12:00' 8492 192 28492
'2020-06-17 00:13:00' 8492 134 29334
'2020-06-17 00:14:00' 8492 135 28234
僅確定連續 NaN 組的大小,並找出哪些小於您的最大間隙大小。 然后通過使用 Boolean 系列來掩蓋整個前向填充柱,您可以有效地僅填充小於或等於指定間隙大小的間隙。
def fwd_fill_gaps(df, col, gap_max):
""" Fill conseuctive NaN when size is <= gap_max """
s = df[col].notnull().cumsum().where(df[col].isnull())
# Only True for NaN gaps of size <= gap_max
s = s.groupby(s).transform('size').le(gap_max)
return df[col].fillna(df[col].ffill().where(s), downcast='infer')
for col in ['A', 'B', 'C']:
df[col] = fwd_fill_gaps(df, col, gap_max=3)
A B C
2019-06-17 00:00:00 8001 201 11991.0
2019-06-17 00:01:01 7999 209 15631.0
2019-06-17 00:02:00 7998 298 47998.0
2019-06-17 00:03:04 7998 300 38030.0
2020-06-17 00:04:00 9900 300 19900.0
2020-06-17 00:05:00 9342 342 29342.0
2020-06-17 00:06:00 9324 324 NaN
2020-06-17 00:07:00 8534 854 NaN
2020-06-17 00:08:00 8358 858 NaN
2020-06-17 00:09:00 9457 457 NaN
2020-06-17 00:10:00 9457 145 27245.0
2020-06-17 00:11:00 8999 189 28999.0
2020-06-17 00:12:00 8492 192 28492.0
2020-06-17 00:13:00 8492 134 29334.0
2020-06-17 00:14:00 8492 135 28234.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.