X y
a 1.0 -1.0
b -2.0 2.0
c 3.0 -3.0
d 2.1 4.0
Output:
x y
a 1.0 -1.0
b -2.0 2.0
c 3.0 -3.0
d 2.1 4.0
Count 2 1
As on the first column, the count is reset to 0 on row b because of -2. The result needs to be a df with the count appended at last.
Let us use cumsum
def your function
def yourfun(x) :
return x[x.ge(0)].groupby(x.lt(0).cumsum()).size().iloc[-1]
df.loc['Count'] = df.apply(yourfun)
df
Out[62]:
X y
a 1.0 -1.0
b -2.0 2.0
c 3.0 -3.0
d 2.1 4.0
Count 2.0 1.0
There is a pure numpy
way without groupby
(in other words: likely to be very fast). It also counts runs of strictly positive values (excluding 0):
def countpos(x):
return np.diff(np.where(np.hstack((-1, x, -1)) <= 0)[0]).max() - 1
df.loc['Count'] = df.apply(countpos)
Result:
>>> df
X y
a 1.0 -1.0
b -2.0 2.0
c 3.0 -3.0
d 2.1 4.0
Count 2.0 1.0
Explanation
The np.where()
looks for the indices of all non-positive values. For example:
>>> np.where(np.array([0,0,1,1,0,0,1]) <= 0)
(array([0, 1, 4, 5]),)
We bracked the actual values with -1
on both sides, to force np.where()
to tell us about those indices too. Then, take the diff
, max
, et voila: the maximum length of runs of strictly positive numbers.
Speed
df = pd.DataFrame(np.random.uniform(-1, 1, size=(10000,100)))
a = %timeit -o df.apply(countpos)
14.3 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Here is another way:
df.loc['count'] = df.lt(0).diff().ne(0).cumsum().stack().groupby(level=1).value_counts().groupby(level=0).max()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.