如何在熊猫日期时间序列的开始/结束位置找到连续的NAN

Question

I have a time series mapping each day to the number of steps walked. 我每天都有一个时间序列映射到所走的步数。 I would like to fill missing values one way if they appear consecutively in the beginning of the date-range I am examining. 如果缺失值在我正在检查的日期范围的开头连续出现，我想用一种方式填充缺失值。 (And otherwise if they are in the "middle" of the data or at the end). （否则，如果它们位于数据的“中间”或末尾）。 Is there a way similar to str.startswith for identifying these consecutive NANs? 有没有类似于str.startswith的方法来识别这些连续的NAN？

Here's an example: 这是一个例子：

original dataset: 原始数据集：

               ID  Steps
Day                     
2019-07-25  53668    0.0
2019-07-26  53668    0.0
2019-07-27  53668    0.0
2019-07-28  53668  100.0
2019-07-29  53668    0.0
2019-07-30  53668    0.0
2019-07-31  53668    0.0
2019-08-01  53668  100.0
2019-08-02  53668    0.0
2019-08-03  53668    0.0
2019-08-04  53668    0.0
2019-08-05  53668    0.0

idx = pd.date_range('2019-07-20','2019-08-03')
df.reindex(idx, fill_value = np.nan)

yields: 收益率：

                 ID  Steps
2019-07-20      NaN    NaN
2019-07-21      NaN    NaN
2019-07-22      NaN    NaN
2019-07-23      NaN    NaN
2019-07-24      NaN    NaN
2019-07-25  53668.0    0.0
2019-07-26  53668.0    0.0
2019-07-27  53668.0    0.0
2019-07-28  53668.0  100.0
2019-07-29  53668.0    0.0
2019-07-30  53668.0    0.0
2019-07-31  53668.0    0.0
2019-08-01  53668.0  100.0
2019-08-02  53668.0    0.0
2019-08-03  53668.0    0.0

How do I know that the 28 NaNs here are at the beginning and not interspersed, or at the end? 我怎么知道这里的28个NaN处于开始而不是散布或结束？

Answer 1

No, there si no such function. 不，没有这样的功能。

You need write it: 您需要编写：

#change data for NaNs in middle
print (df)
               ID  Steps
Day                     
2019-07-25  53668    0.0
2019-07-26  53668    0.0
2019-07-27  53668    0.0
2019-07-28  53668  100.0
2019-07-31  53668    0.0
2019-08-01  53668  100.0
2019-08-02  53668    0.0
2019-08-03  53668    0.0
2019-08-04  53668    0.0
2019-08-05  53668    0.0

idx = pd.date_range('2019-07-20','2019-08-08')
df = df.reindex(idx, fill_value = np.nan)
print (df)
                 ID  Steps
2019-07-20      NaN    NaN
2019-07-21      NaN    NaN
2019-07-22      NaN    NaN
2019-07-23      NaN    NaN
2019-07-24      NaN    NaN
2019-07-25  53668.0    0.0
2019-07-26  53668.0    0.0
2019-07-27  53668.0    0.0
2019-07-28  53668.0  100.0
2019-07-29      NaN    NaN
2019-07-30      NaN    NaN
2019-07-31  53668.0    0.0
2019-08-01  53668.0  100.0
2019-08-02  53668.0    0.0
2019-08-03  53668.0    0.0
2019-08-04  53668.0    0.0
2019-08-05  53668.0    0.0
2019-08-06      NaN    NaN
2019-08-07      NaN    NaN
2019-08-08      NaN    NaN

m = df['ID'].isna()
first = df['ID'].ffill().isna()
last = df['ID'].bfill().isna()

print (df[first])

2019-07-20 NaN    NaN
2019-07-21 NaN    NaN
2019-07-22 NaN    NaN
2019-07-23 NaN    NaN
2019-07-24 NaN    NaN

print (df[last])
            ID  Steps
2019-08-06 NaN    NaN
2019-08-07 NaN    NaN
2019-08-08 NaN    NaN

print (df[~(first | last) & m])
            ID  Steps
2019-07-29 NaN    NaN
2019-07-30 NaN    NaN

如何在熊猫日期时间序列的开始/结束位置找到连续的NAN

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-05 09:23:25

如何在熊猫日期时间序列的开始/结束位置找到连续的NAN

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-05 09:23:25

解决方案1
2 已采纳 2019-08-05 09:23:25