简体   繁体   English

如何在熊猫日期时间序列的开始/结束位置找到连续的NAN

[英]How to locate consecutive NANs in the beginning/end of a pandas datetime-series

I have a time series mapping each day to the number of steps walked. 我每天都有一个时间序列映射到所走的步数。 I would like to fill missing values one way if they appear consecutively in the beginning of the date-range I am examining. 如果缺失值在我正在检查的日期范围的开头连续出现,我想用一种方式填充缺失值。 (And otherwise if they are in the "middle" of the data or at the end). (否则,如果它们位于数据的“中间”或末尾)。 Is there a way similar to str.startswith for identifying these consecutive NANs? 有没有类似于str.startswith的方法来识别这些连续的NAN?

Here's an example: 这是一个例子:

original dataset: 原始数据集:

               ID  Steps
Day                     
2019-07-25  53668    0.0
2019-07-26  53668    0.0
2019-07-27  53668    0.0
2019-07-28  53668  100.0
2019-07-29  53668    0.0
2019-07-30  53668    0.0
2019-07-31  53668    0.0
2019-08-01  53668  100.0
2019-08-02  53668    0.0
2019-08-03  53668    0.0
2019-08-04  53668    0.0
2019-08-05  53668    0.0

idx = pd.date_range('2019-07-20','2019-08-03')
df.reindex(idx, fill_value = np.nan)

yields: 收益率:

                 ID  Steps
2019-07-20      NaN    NaN
2019-07-21      NaN    NaN
2019-07-22      NaN    NaN
2019-07-23      NaN    NaN
2019-07-24      NaN    NaN
2019-07-25  53668.0    0.0
2019-07-26  53668.0    0.0
2019-07-27  53668.0    0.0
2019-07-28  53668.0  100.0
2019-07-29  53668.0    0.0
2019-07-30  53668.0    0.0
2019-07-31  53668.0    0.0
2019-08-01  53668.0  100.0
2019-08-02  53668.0    0.0
2019-08-03  53668.0    0.0

How do I know that the 28 NaNs here are at the beginning and not interspersed, or at the end? 我怎么知道这里的28个NaN处于开始而不是散布或结束?

No, there si no such function. 不,没有这样的功能。

You need write it: 您需要编写:

#change data for NaNs in middle
print (df)
               ID  Steps
Day                     
2019-07-25  53668    0.0
2019-07-26  53668    0.0
2019-07-27  53668    0.0
2019-07-28  53668  100.0
2019-07-31  53668    0.0
2019-08-01  53668  100.0
2019-08-02  53668    0.0
2019-08-03  53668    0.0
2019-08-04  53668    0.0
2019-08-05  53668    0.0

idx = pd.date_range('2019-07-20','2019-08-08')
df = df.reindex(idx, fill_value = np.nan)
print (df)
                 ID  Steps
2019-07-20      NaN    NaN
2019-07-21      NaN    NaN
2019-07-22      NaN    NaN
2019-07-23      NaN    NaN
2019-07-24      NaN    NaN
2019-07-25  53668.0    0.0
2019-07-26  53668.0    0.0
2019-07-27  53668.0    0.0
2019-07-28  53668.0  100.0
2019-07-29      NaN    NaN
2019-07-30      NaN    NaN
2019-07-31  53668.0    0.0
2019-08-01  53668.0  100.0
2019-08-02  53668.0    0.0
2019-08-03  53668.0    0.0
2019-08-04  53668.0    0.0
2019-08-05  53668.0    0.0
2019-08-06      NaN    NaN
2019-08-07      NaN    NaN
2019-08-08      NaN    NaN

m = df['ID'].isna()
first = df['ID'].ffill().isna()
last = df['ID'].bfill().isna()

print (df[first])

2019-07-20 NaN    NaN
2019-07-21 NaN    NaN
2019-07-22 NaN    NaN
2019-07-23 NaN    NaN
2019-07-24 NaN    NaN

print (df[last])
            ID  Steps
2019-08-06 NaN    NaN
2019-08-07 NaN    NaN
2019-08-08 NaN    NaN

print (df[~(first | last) & m])
            ID  Steps
2019-07-29 NaN    NaN
2019-07-30 NaN    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM