[英]How to count the number of occurences before a particular value in dataframe python?
I have a dataframe like below: 我有一个如下数据框:
A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0
I want the number of occurence of zeroes from df['B']
under the following condition: 我希望在以下情况下
df['B']
出现零的次数:
if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.
expected output: 预期输出:
A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan
I dont know how to formulate the count part. 我不知道如何计算计数部分。 Any help is really appreciated
任何帮助都非常感谢
Using some masking and a groupby on your reversed series. 在反向系列中使用一些遮罩和groupby。 This assumes binary data (only 0 and 1)
假设二进制数据(仅0和1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach 和快速的基于
numpy
的方法
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance 性能
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
: IIUC的一种方法是使用自定义的
groupby.cumcount
器,并使用groupby.cumcount
聚合:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
Let us push into one-line 让我们推入一条线
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.