计算具有多个值的数组中的连续值 numpy/pandas

Question

I checked this question and others on SO but the trick is always summing True or False values.我在 SO 上检查了这个问题和其他问题，但诀窍总是对 True 或 False 值求和。

My case is the following array:我的情况是以下数组：

arr = [1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4]

I want to get for each member of the array the length of the "current" streak of repeated value.我想为数组的每个成员获取重复值的“当前”条纹的长度。

For the example above I would like to get:对于上面的例子，我想得到：

res = [1,1,1,2,1,1,1,1,2,3,1,2,1,2,3,1,1,2,3]

I could write a dumb loop but is there a clever or already built-in way to do this in numpy/pandas?我可以编写一个愚蠢的循环，但是在 numpy/pandas 中是否有一种聪明的或已经内置的方法来做到这一点？

Answer 1

A pandas way for input array arr would be -输入数组arr的 pandas 方式将是 -

In [55]: I = np.r_[False,arr[:-1]!=arr[1:]].cumsum()

In [56]: df = pd.DataFrame({'ids':I,'val':np.ones(len(arr),dtype=int)})

In [57]: df.groupby('ids')[['val']].cumsum().values.ravel()
Out[57]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])

Another with a custom NumPy implementation to create ranges based on interval lengths/sizes - intervaled_ranges -另一个使用自定义 NumPy 实现来创建基于间隔长度/大小的范围 - intervaled_ranges -

In [91]: m = np.r_[True,arr[:-1]!=arr[1:],True]

In [92]: intervaled_ranges(np.diff(np.flatnonzero(m)),start=1)
Out[92]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])

Answer 2

You can also use pd.Series and groupby :您还可以使用pd.Series和groupby ：

s = pd.Series([1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4])

print (s.groupby((s!=s.shift()).cumsum()).cumcount() + 1)
#
[1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3]

计算具有多个值的数组中的连续值 numpy/pandas

问题描述

2 个解决方案

解决方案1
1 2019-11-12 08:07:01

解决方案2
1 已采纳 2019-11-12 08:52:50

计算具有多个值的数组中的连续值 numpy/pandas

问题描述

2 个解决方案

解决方案1 1 2019-11-12 08:07:01

解决方案2 1 已采纳 2019-11-12 08:52:50

解决方案1
1 2019-11-12 08:07:01

解决方案2
1 已采纳 2019-11-12 08:52:50