[英]Identifying consecutive NaN's with pandas part 2
I have a question related to the earlier question: Identifying consecutive NaN's with pandas我有一个与前面的问题相关的问题: 使用 pandas 识别连续的 NaN
I am new on stackoverflow so I cannot add a comment, but I would like to know how I can partly keep the original index of the dataframe when counting the number of consecutive nans.我是 stackoverflow 的新手,所以我无法添加评论,但我想知道在计算连续 nan 的数量时如何部分保留 dataframe 的原始索引。
So instead of:所以而不是:
df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df
Out[38]:
a
0 1
1 2
2 NaN
3 NaN
4 NaN
5 6
6 7
7 8
8 9
9 10
10 NaN
11 NaN
12 13
13 14
I would like to obtain the following:我想获得以下信息:
Out[41]:
a
0 0
1 0
2 3
5 0
6 0
7 0
8 0
9 0
10 2
12 0
13 0
I have found a workaround.我找到了一种解决方法。 It is quite ugly, but it does the trick.
这很丑陋,但它可以解决问题。 I hope you don't have massive data, because it might be not very performing:
我希望你没有海量数据,因为它可能不是很好:
df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df1 = df.a.isnull().astype(int).groupby(df.a.notnull().astype(int).cumsum()).sum()
# Determine the different groups of NaNs. We only want to keep the 1st. The 0's are non-NaN values, the 1's are the first in a group of NaNs.
b = df.isna()
df2 = b.cumsum() - b.cumsum().where(~b).ffill().fillna(0).astype(int)
df2 = df2.loc[df2['a'] <= 1]
# Set index from the non-zero 'NaN-count' to the index of the first NaN
df3 = df1.loc[df1 != 0]
df3.index = df2.loc[df2['a'] == 1].index
# Update the values from df3 (which has the right values, and the right index), to df2
df2.update(df3)
The NaN-group thingy is inspired by the following answer: This is coming from the this answer . NaN-group thingy 的灵感来自以下答案:这来自 this answer 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.