简体   繁体   中英

How to find the longest consecutive string of values in pandas dataframe

I'm looking to find the longest string of zeros in my pandas df. I have a df array with 10 columns, each with 25000 rows that have either a null, a zero or a non-zero number. I am looking to calculate:

 1. A value which states the longest consecutive number 
        of zeros in each column for all the columns. 
 2. A value which states the longest consecutive number 
         of zeros AND nulls in each column for all the columns. 

eg if the first column was:

[col1:1,2,4,5,6,2,3,0,0,0,0,1,2,... (remaining all numbers)]

would return 4.

Thanks

Setup

Consider the dataframe df

df = pd.DataFrame(dict(
    col0=[1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 9],
    col1=[1, 2, 3, 0, 0, 4, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 2, 0, 4, 8, 9]
))

Solution

def max_zeros(c):
    v = c.values != 0
    d = np.diff(np.flatnonzero(np.diff(np.concatenate([[True], v]))))
    return d[::2].max()

df.apply(max_zeros)

col0    6
col1    3
dtype: int64

If you have a dataframe like

df = pd.DataFrame([[1, 2, 4, 5, 6, 2, 3, 0, 0, 0 ,0, 1, 2],[1, 0, 0, 2, 0, 2, 0, 0, 0, 0 ,0, 1, 2]])

You can use itertools groupby

from itertools import groupby
def get_conti(a):
    m = []
    for group in groupby(range(len(a)), lambda x: a[x]):
        if group[0]==0:
            zero=list(group[1])
            m.append(len(zero))
    return max(m)

df['max'] = df.apply(get_conti,1)

Output:

0  1  2  3  4  5  6  7  8  9  10  11  12  max
0  1  2  4  5  6  2  3  0  0  0   0   1   2    4
1  1  0  0  2  0  2  0  0  0  0   0   1   2    5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM