[英]How to locate consecutive NANs in the beginning/end of a pandas datetime-series
I have a time series mapping each day to the number of steps walked. 我每天都有一个时间序列映射到所走的步数。 I would like to fill missing values one way if they appear consecutively in the beginning of the date-range I am examining. 如果缺失值在我正在检查的日期范围的开头连续出现,我想用一种方式填充缺失值。 (And otherwise if they are in the "middle" of the data or at the end). (否则,如果它们位于数据的“中间”或末尾)。 Is there a way similar to str.startswith for identifying these consecutive NANs? 有没有类似于str.startswith的方法来识别这些连续的NAN?
Here's an example: 这是一个例子:
original dataset: 原始数据集:
ID Steps
Day
2019-07-25 53668 0.0
2019-07-26 53668 0.0
2019-07-27 53668 0.0
2019-07-28 53668 100.0
2019-07-29 53668 0.0
2019-07-30 53668 0.0
2019-07-31 53668 0.0
2019-08-01 53668 100.0
2019-08-02 53668 0.0
2019-08-03 53668 0.0
2019-08-04 53668 0.0
2019-08-05 53668 0.0
idx = pd.date_range('2019-07-20','2019-08-03')
df.reindex(idx, fill_value = np.nan)
yields: 收益率:
ID Steps
2019-07-20 NaN NaN
2019-07-21 NaN NaN
2019-07-22 NaN NaN
2019-07-23 NaN NaN
2019-07-24 NaN NaN
2019-07-25 53668.0 0.0
2019-07-26 53668.0 0.0
2019-07-27 53668.0 0.0
2019-07-28 53668.0 100.0
2019-07-29 53668.0 0.0
2019-07-30 53668.0 0.0
2019-07-31 53668.0 0.0
2019-08-01 53668.0 100.0
2019-08-02 53668.0 0.0
2019-08-03 53668.0 0.0
How do I know that the 28 NaNs here are at the beginning and not interspersed, or at the end? 我怎么知道这里的28个NaN处于开始而不是散布或结束?
No, there si no such function. 不,没有这样的功能。
You need write it: 您需要编写:
#change data for NaNs in middle
print (df)
ID Steps
Day
2019-07-25 53668 0.0
2019-07-26 53668 0.0
2019-07-27 53668 0.0
2019-07-28 53668 100.0
2019-07-31 53668 0.0
2019-08-01 53668 100.0
2019-08-02 53668 0.0
2019-08-03 53668 0.0
2019-08-04 53668 0.0
2019-08-05 53668 0.0
idx = pd.date_range('2019-07-20','2019-08-08')
df = df.reindex(idx, fill_value = np.nan)
print (df)
ID Steps
2019-07-20 NaN NaN
2019-07-21 NaN NaN
2019-07-22 NaN NaN
2019-07-23 NaN NaN
2019-07-24 NaN NaN
2019-07-25 53668.0 0.0
2019-07-26 53668.0 0.0
2019-07-27 53668.0 0.0
2019-07-28 53668.0 100.0
2019-07-29 NaN NaN
2019-07-30 NaN NaN
2019-07-31 53668.0 0.0
2019-08-01 53668.0 100.0
2019-08-02 53668.0 0.0
2019-08-03 53668.0 0.0
2019-08-04 53668.0 0.0
2019-08-05 53668.0 0.0
2019-08-06 NaN NaN
2019-08-07 NaN NaN
2019-08-08 NaN NaN
m = df['ID'].isna()
first = df['ID'].ffill().isna()
last = df['ID'].bfill().isna()
print (df[first])
2019-07-20 NaN NaN
2019-07-21 NaN NaN
2019-07-22 NaN NaN
2019-07-23 NaN NaN
2019-07-24 NaN NaN
print (df[last])
ID Steps
2019-08-06 NaN NaN
2019-08-07 NaN NaN
2019-08-08 NaN NaN
print (df[~(first | last) & m])
ID Steps
2019-07-29 NaN NaN
2019-07-30 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.