简体   繁体   English

有效合并熊猫中的子序列

[英]Efficiently merging subsequences in pandas

I have predictions from the ML model in the form of pandas Series (binary only).我有来自 ML 模型的 Pandas 系列形式的预测(仅限二进制)。 For example: pd.Series([0,0,0,1,1,0,0,1,0,1]) .例如: pd.Series([0,0,0,1,1,0,0,1,0,1])

I want to merge subsequences of 1's if the number of 0's between them is less than some threshold.如果它们之间的 0 数小于某个阈值,我想合并 1 的子序列。 For example, if the threshold is 1, I want to get the following series instead: pd.Series([0,0,0,1,1,0,0,1,1,1]) .例如,如果阈值为 1,我想改为获得以下系列: pd.Series([0,0,0,1,1,0,0,1,1,1])

If the threshold is 2: pd.Series([0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0]) -> pd.Series([0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0]) .如果阈值为 2: pd.Series([0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0]) -> pd.Series([0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0])

Of course, it is possible to do it just iterating the Series row-by-row, but I was wondering if there is an efficient way of doing it by using some pandas methods?当然,可以通过逐行迭代系列来完成它,但我想知道是否有一种有效的方法可以通过使用一些 Pandas 方法来做到这一点?

Seems like you need好像你需要

v=s.loc[s.idxmax():s.iloc[::-1].idxmax()] # we need exclude the bottom 0 and head 0
s1=v.eq(1).cumsum()# create the key 
s1=v.mask(s1.groupby(s1).transform('max')<=2,1) # setting up the max count number 
s.update(s1) #using update to update origin series 
s
0     0
1     1
2     1
3     1
4     1
5     1
6     1
7     0
8     0
9     1
10    0
11    0
12    0
13    0
14    1
15    0
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM