[英]count consecutive occurrences by condition in pandas
I have the following dataframe: 我有以下数据框:
data = {'A': [0,0,0,1,1,1,0,1], 'B': [0,1,1,1,1,1,1,1], 'C': [1,0,1,0,1,1,1,0]}
df=pd.DataFrame(data)
df=df.transpose()
columns={'0':'population'}
df=df.rename(index=str, columns={0: "20062", 1: "20063", 2: "20064", 3: "20071", 4: "20072", 5: "20073", 6: "20074", 7: "20081"})
Out[135]:
20062 20063 20064 20071 20072 20073 20074 20081
A 0 0 0 1 1 1 0 1
B 0 1 1 1 1 1 1 1
C 1 0 1 0 1 1 1 0
My main task is to find the number of ``disappearances" 我的主要任务是找到``失踪''的数量
A 'Disapperance' let us defined to be the case when 0
is followed after 1
'Disapaance'让我们定义为在1
之后跟随0
的情况
So, the expected outcome in this example is A
disappears only once in 20074
, B
disappearance zero times, while C
disappears 3 times (in 20063,20071, and 20081 respectively) 因此,此示例中的预期结果是A
在20074
消失一次 , B
消失0次,而C
消失3次(分别在20063、20071和20081中)
I want to do the following: 我要执行以下操作:
Can someone help how I can do this in python. 有人可以帮我如何在python中做到这一点。
My dataframe is quite large, so I would ideally look for a general solution. 我的数据框非常大,因此理想情况下,我将寻找通用解决方案。
Thanks 谢谢
You can use diff
and sum across axis=None
to get total disappearances 您可以使用diff
和sum沿axis=None
获得总消失
>>> df.diff(axis=1).eq(-1).values.sum(axis=None)
4
To get per row, sum
across axis=1
要获取每行,则沿axis=1
sum
axis=1
df.diff(axis=1).eq(-1).sum(axis=1)
A 1
B 0
C 3
dtype: int64
To get per time, sum
across axis=0
要获得每次, axis=0
sum
axis=0
df.diff(axis=1).eq(-1).sum(axis=0)
20062 0
20063 1
20064 0
20071 1
20072 0
20073 0
20074 1
20081 1
dtype: int64
First mask
all 0 to NaN
, then we do ffill
for each row , and find the different between this new df between the original one , sum
it 首先将所有0 mask
为NaN
,然后对每一行进行ffill
,并找到原始新df与原始行之间的差异, sum
(df.mask(df==0).ffill(1).fillna(0)!=df).sum(1)
Out[146]:
A 1
B 0
C 3
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.