[英]Running total of consecutive identical values
How can I get a running total of consecutive 1's in a pandas Series? 如何获得大熊猫系列中连续的连续1个数字?
For example, s = pd.Series([5, 1, 4, 1, 1, 2, 3, 1, 1, 1, 4])
. 例如, s = pd.Series([5, 1, 4, 1, 1, 2, 3, 1, 1, 1, 4])
。 I want to obtain pd.Series([0, 1, 0, 1, 2, 0, 0, 1, 2, 3, 0])
. 我想获得pd.Series([0, 1, 0, 1, 2, 0, 0, 1, 2, 3, 0])
。
(Pandas 0.18.0) (熊猫0.18.0)
You can try groupby
with cumcount
comparing s1 != 1
with cumsum
: 您可以尝试使用cumcount
将s1 != 1
与cumsum
比较的groupby
:
print s1.groupby((s1 != 1).cumsum()).cumcount()
0 0
1 1
2 0
3 1
4 2
5 0
6 0
7 1
8 2
9 3
10 0
dtype: int64
Better explanation: 更好的解释:
df = pd.DataFrame(s1, columns=['orig'])
df['not1'] = s1 != 1
df['cumsum'] = (s1 != 1).cumsum()
df['cumcount'] = s1.groupby((s1 != 1).cumsum()).cumcount()
#s1.groupby((s1 != 1).cumsum()).cumcount() is same as:
df['cumcount1'] = df.groupby('cumsum')['orig'].cumcount()
print df
orig not1 cumsum cumcount cumcount1
0 5 True 1 0 0
1 1 False 1 1 1
2 3 True 2 0 0
3 4 True 3 0 0
4 1 False 3 1 1
5 1 False 3 2 2
6 2 True 4 0 0
7 3 True 5 0 0
8 1 False 5 1 1
9 1 False 5 2 2
10 1 False 5 3 3
11 4 True 6 0 0
Or: 要么:
print (s1 == 1) * (s1.groupby((s1 != s1.shift()).cumsum()).cumcount() + 1)
0 0
1 1
2 0
3 1
4 2
5 0
6 0
7 1
8 2
9 3
10 0
dtype: int64
Explanation: 说明:
df = pd.DataFrame(s1, columns=['orig'])
df['compare_shift'] = s1 != s1.shift()
df['cumsum'] = (s1 != s1.shift()).cumsum()
df['cumcount'] = s1.groupby((s1 != s1.shift()).cumsum()).cumcount() + 1
df['cumcount1'] = df.groupby('cumsum')['orig'].cumcount() + 1
df['is1'] = (s1 == 1)
#True in converted to 1, False to 0
df['fin'] = (s1 == 1) * (s1.groupby((s1 != s1.shift()).cumsum()).cumcount() + 1)
print df
orig compare_shift cumsum cumcount cumcount1 is1 fin
0 5 True 1 1 1 False 0
1 1 True 2 1 1 True 1
2 3 True 3 1 1 False 0
3 4 True 4 1 1 False 0
4 1 True 5 1 1 True 1
5 1 False 5 2 2 True 2
6 2 True 6 1 1 False 0
7 3 True 7 1 1 False 0
8 1 True 8 1 1 True 1
9 1 False 8 2 2 True 2
10 1 False 8 3 3 True 3
11 4 True 9 1 1 False 0
Not the pretiest way (and likely not the most optimal), but the following gets the job done (and about 4.5x faster than the other looping answer): 不是最简单的方法(可能不是最佳方法),但是以下方法可以完成工作(比其他循环答案快约4.5倍):
s = pd.Series([5, 1, 4, 1, 1, 2, 3, 1, 1, 1, 4])
def consecutive_n(s, n=1):
a = s[s==n].cumsum()[s.index].fillna(0) / n
b = a[a.diff() > 1]
c = (b.rank() - b)[s.index].fillna(0).cumsum()
return (a + c).apply(lambda x: np.where(x<0, 0, x)).astype(int)
>>> consecutive_n(s, n=1)
0 0
1 1
2 0
3 1
4 2
5 0
6 0
7 1
8 2
9 3
10 0
dtype: int64
Some explanation about the intermediate values: 关于中间值的一些解释:
a
: nth occurence of 1 in the whole series. a
:整个系列中第n个出现的1。
c
: How much has to be added to a
to "reset" the occurrence count when a different number shows between 1's (or n's). c
:当在1(或n)之间显示不同的数字时,必须向a
添加多少以“重置”发生次数。 return value: Applying lambda to ignore negative numbers resulting form a + c
. 返回值:应用lambda忽略形成a + c
负数。
EDIT: Changed slightly the code so it works for any positive integer. 编辑:略微更改了代码,因此它适用于任何正整数。 Example: 例:
>>> t = pd.Series([1, 2, 3, 1, 4, 2, 2, 3, 2, 2, 2, 1])
>>> consecutive_n(t, 2)
0 0
1 1
2 0
3 0
4 0
5 1
6 2
7 0
8 1
9 2
10 3
11 0
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.