[英]Count cumulative and sequential values of the same sign in Pandas series
I wrote this code that computes time since a sign change (from positive to negative or vice versa) in data frame columns. 我编写了这段代码,用于计算数据帧列中的符号变化(从正到负,反之亦然)以来的时间。
df = pd.DataFrame({'x': [1, -4, 5, 1, -2, -4, 1, 3, 2, -4, -5, -5, -6, -1]})
for column in df.columns:
days_since_sign_change = [0]
for k in range(1, len(df[column])):
last_different_sign_index = np.where(np.sign(df[column][:k]) != np.sign(df[column][k]))[0][-1]
days_since_sign_change.append(abs(last_different_sign_index- k))
df[column+ '_days_since_sign_change'] = days_since_sign_change
df[column+ '_days_since_sign_change'][df[column] < 0] = df[column+ '_days_since_sign_change'] *-1
# this final stage allows the "days_since_sign_change" column to also indicate if the sign changed
# from - to positive or from positive to negative.
In [302]:df
Out[302]:
x x_days_since_sign_change
0 1 0
1 -4 -1
2 5 1
3 1 2
4 -2 -1
5 -4 -2
6 1 1
7 3 2
8 2 3
9 -4 -1
10 -5 -2
11 -5 -3
12 -6 -4
13 -1 -5
Issue : with large datasets (150,000 * 50,000), the python code is extremely slow. 问题 :对于大型数据集(150,000 * 50,000),python代码非常慢。 How can I speed this up? 我怎样才能加快速度?
You can surely do this without a loop. 您当然可以无循环地执行此操作。 Create a sign column with -1 if value in x is less than 0 and 1 otherwise. 如果x中的值小于0,则创建一个带有-1的符号列,否则创建1。 Then group that sign column by difference in the value in the current row vs the previous one and get cumulative sum. 然后,根据当前行与上一行中的值之差将该符号列分组,并获得累加和。
df['x_days_since_sign_change'] = (df['x'] > 0).astype(int).replace(0, -1)
df.iloc[0,1] = 0
df.groupby((df['x_days_since_sign_change'] != df['x_days_since_sign_change'].shift()).cumsum()).cumsum()
x x_days_since_sign_change
0 1 0
1 -4 -1
2 5 1
3 6 2
4 -2 -1
5 -6 -2
6 1 1
7 4 2
8 6 3
9 -4 -1
10 -9 -2
11 -14 -3
12 -20 -4
13 -21 -5
You can using cumcount
您可以使用cumcount
s=df.groupby(df.x.gt(0).astype(int).diff().ne(0).cumsum()).cumcount().add(1)*df.x.gt(0).replace({True:1,False:-1})
s.iloc[0]=0
s
Out[645]:
0 0
1 -1
2 1
3 2
4 -1
5 -2
6 1
7 2
8 3
9 -1
10 -2
11 -3
12 -4
13 -5
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.