[英]pandas - find index distance between batches of equal values in a row
I would like to find the "distance" between the starting points of two batches of 1
's in a row or in other words the length of batches of " 1
's followed by 0
's" (indicated with spaces below).我想找到连续两批1
的起点之间的“距离”,或者换句话说,“ 1
后跟0
”的批次长度(用下面的空格表示)。
So I start with the following series:所以我从以下系列开始:
df = pd.Series([0,0, 1,1,1,0,0, 1,1,0, 1,1,1,0,0,0,0, 1,1,1,0,0,0, 1,1,0,0])
and would like to get the following output:并希望获得以下输出:
0 NaN
1 5.0
2 3.0
3 7.0
4 6.0
5 NaN
I know how to get either the counts of the number of 1
's in a row or the counts of the number of 0
's in a row but I don't know how to deal with the case of this pattern of 1
's followed by 0
's as a pattern for its own...我知道如何获得连续1
的数量或连续0
的数量,但我不知道如何处理这种1
模式的情况后跟0
作为它自己的模式......
Having NaN's at the beginning and end would be the ideal case but is not necessary.在开头和结尾使用 NaN 是理想的情况,但不是必需的。
Use diff()
to find the difference, 1
indicates starting of a new batch.使用diff()
查找差异, 1
表示开始新批次。 Then you can use np.diff
on the index:然后你可以在索引上使用np.diff
:
s = df.diff().eq(1)
np.diff(s.index[s])
# or a one-liner
# np.diff(np.where(df.diff().eq(1))[0])
Output:输出:
array([5, 3, 7, 6])
Note There is an edge case where the series starts with a 1
.注意有一个边缘情况,其中系列以1
开头。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.