按条件统计熊猫连续发生的次数

Question

I have the following dataframe: 我有以下数据框：

data = {'A': [0,0,0,1,1,1,0,1], 'B': [0,1,1,1,1,1,1,1], 'C': [1,0,1,0,1,1,1,0]}
df=pd.DataFrame(data)
df=df.transpose()
columns={'0':'population'}
df=df.rename(index=str, columns={0: "20062", 1: "20063", 2: "20064", 3: "20071", 4: "20072", 5: "20073", 6: "20074", 7: "20081"})


Out[135]: 
   20062  20063  20064  20071  20072  20073  20074  20081
A      0      0      0      1      1      1      0      1
B      0      1      1      1      1      1      1      1
C      1      0      1      0      1      1      1      0

My main task is to find the number of ``disappearances" 我的主要任务是找到``失踪''的数量

A 'Disapperance' let us defined to be the case when 0 is followed after 1 'Disapaance'让我们定义为在1之后跟随0的情况

So, the expected outcome in this example is A disappears only once in 20074 , B disappearance zero times, while C disappears 3 times (in 20063,20071, and 20081 respectively) 因此，此示例中的预期结果是A在20074消失一次， B消失0次，而C消失3次（分别在20063、20071和20081中）

I want to do the following: 我要执行以下操作：

total number of disappearances by time (the columns in this example, so in 20063 there was one disappearance, again 1 in 20072 etc) 按时间消失的总数（此示例中的列，因此在20063年消失了一次，在20072年再次消失了等）
by type: A disappeared once in 20074, C diseappered 3 times in 20063, 20071 and 20081 按类型划分：A在20074年消失了一次，C在20063、20071和20081年消失了3次
total number of disappearances (here 4) 失踪总数（此处为4）

Can someone help how I can do this in python. 有人可以帮我如何在python中做到这一点。

My dataframe is quite large, so I would ideally look for a general solution. 我的数据框非常大，因此理想情况下，我将寻找通用解决方案。

Thanks 谢谢

Answer 1

You can use diff and sum across axis=None to get total disappearances 您可以使用diff和sum沿axis=None获得总消失

>>> df.diff(axis=1).eq(-1).values.sum(axis=None)
4

To get per row, sum across axis=1 要获取每行，则沿axis=1 sum axis=1

df.diff(axis=1).eq(-1).sum(axis=1)

A    1
B    0
C    3
dtype: int64

To get per time, sum across axis=0 要获得每次， axis=0 sum axis=0

df.diff(axis=1).eq(-1).sum(axis=0)

20062    0
20063    1
20064    0
20071    1
20072    0
20073    0
20074    1
20081    1
dtype: int64

Answer 2

First mask all 0 to NaN , then we do ffill for each row , and find the different between this new df between the original one , sum it 首先将所有0 mask为NaN ，然后对每一行进行ffill ，并找到原始新df与原始行之间的差异， sum

(df.mask(df==0).ffill(1).fillna(0)!=df).sum(1)
Out[146]: 
A    1
B    0
C    3
dtype: int64

按条件统计熊猫连续发生的次数

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-05 15:16:07

解决方案2
1 2019-04-05 15:16:10

按条件统计熊猫连续发生的次数

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-05 15:16:07

解决方案2 1 2019-04-05 15:16:10

解决方案1
2 已采纳 2019-04-05 15:16:07

解决方案2
1 2019-04-05 15:16:10