[英]Efficiently merging subsequences in pandas
I have predictions from the ML model in the form of pandas Series (binary only).我有来自 ML 模型的 Pandas 系列形式的预测(仅限二进制)。 For example: pd.Series([0,0,0,1,1,0,0,1,0,1])
.例如: pd.Series([0,0,0,1,1,0,0,1,0,1])
。
I want to merge subsequences of 1's if the number of 0's between them is less than some threshold.如果它们之间的 0 数小于某个阈值,我想合并 1 的子序列。 For example, if the threshold is 1, I want to get the following series instead: pd.Series([0,0,0,1,1,0,0,1,1,1])
.例如,如果阈值为 1,我想改为获得以下系列: pd.Series([0,0,0,1,1,0,0,1,1,1])
。
If the threshold is 2: pd.Series([0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0])
-> pd.Series([0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0])
.如果阈值为 2: pd.Series([0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0])
-> pd.Series([0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0])
。
Of course, it is possible to do it just iterating the Series row-by-row, but I was wondering if there is an efficient way of doing it by using some pandas methods?当然,可以通过逐行迭代系列来完成它,但我想知道是否有一种有效的方法可以通过使用一些 Pandas 方法来做到这一点?
Seems like you need好像你需要
v=s.loc[s.idxmax():s.iloc[::-1].idxmax()] # we need exclude the bottom 0 and head 0
s1=v.eq(1).cumsum()# create the key
s1=v.mask(s1.groupby(s1).transform('max')<=2,1) # setting up the max count number
s.update(s1) #using update to update origin series
s
0 0
1 1
2 1
3 1
4 1
5 1
6 1
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 1
15 0
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.