Pandas - 如何在不循环的情况下计算自时间序列中最后一次出现 True 以来的连续 Falses？

Question

is there a pythonic solution with pandas for the given problem?对于给定的问题，是否有使用 Pandas 的 Pythonic 解决方案？

Supposed I have masked Series called A假设我屏蔽了名为 A 的系列

[False, True, False, False, False, True, False, False]

I want to get a series which counts the False values since the last occurence of True.我想得到一个自上次出现 True 以来计算 False 值的系列。 For the example above this would output something like:对于上面的示例，这将输出如下内容：

[NaN, 0, 1, 2, 3, 0, 1, 2]

And as a bonus also summed up to:作为奖励也总结为：

[NaN, 3, 2]

containing only the maximum lengths of all consecutive occurences of False values after a True value.仅包含 True 值之后所有连续出现的 False 值的最大长度。

Many thanks beforehand非常感谢

draj德拉吉

Answer 1

Try this尝试这个

out = (~A).cumsum() - (~A).cumsum().where(A).ffill()

Out[1372]:
0    NaN
1    0.0
2    1.0
3    2.0
4    3.0
5    0.0
6    1.0
7    2.0
dtype: float64

If you want to get sum, try this from out above如果你想获得总和，从这个尝试out上述

out_sum = out[A.shift(-1, fill_value=True) & out.ne(0)]

Out[1411]:
0    NaN
4    3.0
7    2.0
dtype: float64

Answer 2

If you want to only work with Series you can adapt @kiki's answer this way:如果您只想使用系列，您可以通过这种方式调整@kiki 的答案：

s = pd.Series([False, True, False, False, False, True, False, False])
(s.groupby(s.cumsum()).count()-1).replace(0,np.nan).tolist()

Anyways I think that if you want to understand what's happening under the hood, @kiki answer is a bit more transparent无论如何，我认为如果您想了解幕后发生的事情，@kiki 的答案会更透明一些

Output:输出：

[nan, 3.0, 2.0]

Also, for the complete Series it's just:此外，对于完整的系列，它只是：

(s.groupby(s.cumsum()).cumcount())

Output 2:输出 2：

Please tell me if having a zero instead of a nan is a problem in the first row.请告诉我在第一行中是否有一个零而不是nan是一个问题。

Answer 3

I think the cumsum function can help you to create a kind of id at each True apparition.我认为 cumsum 函数可以帮助您在每个 True 幻影中创建一种 id。 Then you are able to groupby and do what you need然后你就可以分组并做你需要的

res = pd.DataFrame([False, True, False, False, False, True, False, False],columns=['val'])
res['cumsum'] = res.val.cumsum()
res.groupby("cumsum").count() - 1

Output:输出：

Answer 4

An adapation from @Andy L's answer to a dataframe:改编自@Andy L 对数据框的回答：

df = pd.DataFrame({'values':[False, True, False, False, False, True, False, False]})

df['cumsum'] = (~df['values']).cumsum() - (~df['values']).cumsum().where(df['values']).ffill()
grouped = pd.concat([df.loc[df[df['values']==True].index-1,:],df.tail(1)])

Output:输出：

    values  cumsum
0    False     NaN
1     True     0.0
2    False     1.0
3    False     2.0
4    False     3.0
5     True     0.0
6    False     1.0
7    False     2.0

Grouped output:分组输出：

    values  cumsum
0    False     NaN
4    False     3.0
7    False     2.0

Pandas - 如何在不循环的情况下计算自时间序列中最后一次出现 True 以来的连续 Falses？

问题描述

4 个解决方案

解决方案1
3 2020-03-03 17:52:40

解决方案2
3 2020-03-03 17:58:47

Output:输出：

Output 2:输出 2：

解决方案3
1 2020-03-03 17:54:40

解决方案4
1 2020-03-03 18:17:54

Pandas - 如何在不循环的情况下计算自时间序列中最后一次出现 True 以来的连续 Falses？

问题描述

4 个解决方案

解决方案1 3 2020-03-03 17:52:40

解决方案2 3 2020-03-03 17:58:47

Output:输出：

Output 2:输出 2：

解决方案3 1 2020-03-03 17:54:40

解决方案4 1 2020-03-03 18:17:54

解决方案1
3 2020-03-03 17:52:40

解决方案2
3 2020-03-03 17:58:47

解决方案3
1 2020-03-03 17:54:40

解决方案4
1 2020-03-03 18:17:54