简体   繁体   English

计算熊猫中连续 1 的组

[英]Count groups of consecutive 1s in pandas

I have a list of '1's and '0s' and I would like to calculate the number of groups of consecutive '1's.我有一个“1”和“0”的列表,我想计算连续“1”的组数。

mylist = [0,0,1,1,0,1,1,1,1,0,1,0]

Doing it by hand gives us 3 groups but is there a way to do it by python?手工做给我们 3 组,但有没有办法通过 python 来做?

Here I count whenever there is a jump from 0 to 1. Prepending the 0 prevents not counting a leading sequence.每当有从 0 到 1 的跳跃时,我都会在这里计数。在 0 之前添加可防止不计算前导序列。

import numpy as np

mylist_arr = np.array([0] + [0,0,1,1,0,1,1,1,1,0,1,0])
diff = np.diff(mylist_arr)
count = np.sum(diff == 1)

you can try this你可以试试这个

import numpy as np
import pandas as pd
df=pd.DataFrame(data = [0,0,1,1,0,1,1,1,1,0,1,0])
df['Gid']=df[0].diff().eq(1).cumsum()
df=df[df[0].eq(1)]
df.groupby('Gid').size()
Out[245]: 
Gid
1    2
2    4
3    1
dtype: int64

sum(df.groupby('Gid').size())/len(df.groupby('Gid').size())
Out[244]: 2.3333333333333335

Here's one solution:这是一种解决方案:

durations = []

for n, d in enumerate(mylist):
    if (n == 0 and d == 1) or (n > 0 and mylist[n-1] == 0 and d == 1):
        durations.append(1)
    elif d == 1:
        durations[-1] += 1

def mean(x):
    return sum(x)/len(x)

print(durations)
print(mean(durations))

Option 1选项 1

With pandas .pandas First, initialise a dataframe:首先,初始化一个数据框:

In [78]: df
Out[78]: 
    Col1
0      0
1      0
2      1
3      1
4      0
5      1
6      1
7      1
8      1
9      0
10     1
11     0

Now calculate sum total by number of groups:现在按组数计算总和:

In [79]: df.sum() / df.diff().eq(1).cumsum().max()
Out[79]: 
Col1    2.333333
dtype: float64

If you want just the number of groups, df.diff().eq(1).cumsum().max() is enough.如果你只想要组的数量, df.diff().eq(1).cumsum().max()就足够了。


Option 2选项 2

With itertools.groupby :使用itertools.groupby

In [88]: sum(array) / sum(1 if sum(g) else 0 for  _, g in  itertools.groupby(array))
Out[88]: 2.3333333333333335

If you want just the number of groups, sum(1 if sum(g) else 0 for _, g in itertools.groupby(array)) is enough.如果你只想要组的数量, sum(1 if sum(g) else 0 for _, g in itertools.groupby(array))就足够了。

You can try this:你可以试试这个:

mylist = [0,0,1,1,0,1,1,1,1,0,1,0]
previous = mylist[0]
count = 0

for i in mylist[1:]:
   if i == 1:
       if previous == 0:
            previous = 1
   else:
       if i == 0:
            if previous == 1:
                 count += 1
                 previous = 0

print count

Output:输出:

3

Take a look at itertools.groupby :看看itertools.groupby

import itertools
import operator

def get_1_groups(ls):
    return sum(map(operator.itemgetter(0), itertools.groupby(ls)))

This works because itertools.groupby returns (the iterable equivalent) of:这是有效的,因为itertools.groupby返回(可迭代的等价物):

itertools.groupby([0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0])
# ==>
[(0, [0, 0]), (1, [1, 1]), (0, [0]), (1, [1, 1, 1, 1]), (0, [0]), (1, [1]), (0, [0])]

So you are just summing the first item.所以你只是总结了第一项。

If you can have other items that are not 0, they would add to the sum.如果您可以有其他不为 0 的项目,则它们会添加到总和中。

You can do something like this:你可以这样做:

def count_groups(ls, target=1):
    return sum(target == value for value, _ in itertools.groupby(ls))

This can be accomplished without much work by simply summing the number of times the list transitions from 0 to 1 (Counting rising signal edges ):这可以通过简单地将列表从0转换为1的次数相加(计数上升信号边缘)来完成,无需太多工作:

count = 0
last = 0
for element in mylist:
    if element != last:
        last = element
        if element:  # 1 is truthy
            count += 1
print count

Here is my solution:这是我的解决方案:

c is the list to play on c 是要播放的列表

   c=[1,0,1,1,1,0]
   max=0
   counter = 0
   
   for j in c:
     if j==1:
        counter+=1

     else:
        if counter>max:
           max=counter
           counter=0
           continue

   if counter>max:
      max=counter

   print(max)

A Quick and dirty one-liner (almost)一个快速而肮脏的单线(几乎)

import re
mylist = [0,0,1,1,0,1,1,1,1,0,1,0]
print len(re.sub(r'0+', '0', ''.join(str(x) for x in mylist)).strip('0').split('0')) 
3

step by step:一步一步:

import re
mylist = [0,0,1,1,0,1,1,1,1,0,1,0]
sal1 = ''.join(str(x) for x in mylist) # returns a string from the list
sal2 = re.sub(r'0+', '0', sal1)   # remove duplicates of zeroes
sal3 = sal2.strip('0')            # remove 0s from the start & the end of the string
sal4 = len(sal3.split('0'))       # split the string using '0' as separators into a list, and calculate it's length

This throws:这抛出:

sal  -> 001101111010
sal2 -> 01101111010
sal3 -> 110111101
sal4 -> 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM