I'm looking to find the longest string of zeros in my pandas df. I have a df array with 10 columns, each with 25000 rows that have either a null, a zero or a non-zero number. I am looking to calculate:
1. A value which states the longest consecutive number
of zeros in each column for all the columns.
2. A value which states the longest consecutive number
of zeros AND nulls in each column for all the columns.
eg if the first column was:
[col1:1,2,4,5,6,2,3,0,0,0,0,1,2,... (remaining all numbers)]
would return 4.
Thanks
Setup
Consider the dataframe df
df = pd.DataFrame(dict(
col0=[1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 9],
col1=[1, 2, 3, 0, 0, 4, 0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 2, 0, 4, 8, 9]
))
Solution
def max_zeros(c):
v = c.values != 0
d = np.diff(np.flatnonzero(np.diff(np.concatenate([[True], v]))))
return d[::2].max()
df.apply(max_zeros)
col0 6
col1 3
dtype: int64
If you have a dataframe like
df = pd.DataFrame([[1, 2, 4, 5, 6, 2, 3, 0, 0, 0 ,0, 1, 2],[1, 0, 0, 2, 0, 2, 0, 0, 0, 0 ,0, 1, 2]])
You can use itertools groupby
from itertools import groupby
def get_conti(a):
m = []
for group in groupby(range(len(a)), lambda x: a[x]):
if group[0]==0:
zero=list(group[1])
m.append(len(zero))
return max(m)
df['max'] = df.apply(get_conti,1)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 max 0 1 2 4 5 6 2 3 0 0 0 0 1 2 4 1 1 0 0 2 0 2 0 0 0 0 0 1 2 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.