[英]Count occurrences within a specific range
我有一個看起來像這樣的數據框:
Tag
0 skip_1
1 run
2 skip_1
3 run
4 skip_1
5 run
6 skip_2
7 run
8 skip_1
9 run
10 skip_2
11 jump
12 skip_1
13 run
14 skip_2
15 jump
16 skip_1
17 run
18 skip_2
19 cleanup_jump
20 skip_1
21 run
22 skip_2
23 run
24 skip_2
25 jump
26 skip_1
27 run
28 skip_2
29 jump
首先,我想計算兩個 JUMP 事件之間的 RUN 發生次數,然后在此范圍內從最近到最早枚舉此事件。 預期結果將是:
Tag Jump_Run_Count Run_Order
0 skip_1 0 0
1 run 0 5
2 skip_1 0 0
3 run 0 4
4 skip_1 0 0
5 run 0 3
6 skip_2 0 0
7 run 0 2
8 skip_1 0 0
9 run 0 1
10 skip_2 0 0
11 jump 5 0
12 skip_1 0 0
13 run 0 1
14 skip_2 0 0
15 jump 1 0
16 skip_1 0 0
17 run 0 0
18 skip_2 0 0
19 cleanup_jump 0 0
20 skip_1 0 0
21 run 0 2
22 skip_2 0 0
23 run 0 1
24 skip_2 0 0
25 jump 2 0
26 skip_1 0 0
27 run 0 1
28 skip_2 0 0
29 jump 1 0
這里的一個問題是第一個 RUN 出現不在 2 JUMP 內,而是在第一個 JUMP 和列的開頭之間。
其次,我想對 CLEANUP_JUMP 和 JUMP 范圍進行相同的計數和枚舉,並將其存儲在單獨的列中。
Tag Jump_Run_Count Run_Order Cleanup_Jump_Dig_Count Run_Order2
0 skip_1 0 0 0 0
1 run 0 5 0 0
2 skip_1 0 0 0 0
3 run 0 4 0 0
4 skip_1 0 0 0 0
5 run 0 3 0 0
6 skip_2 0 0 0 0
7 run 0 2 0 0
8 skip_1 0 0 0 0
9 run 0 1 0 0
10 skip_2 0 0 0 0
11 jump 5 0 0 0
12 skip_1 0 0 0 0
13 run 0 1 0 0
14 skip_2 0 0 0 0
15 jump 1 0 0 0
16 skip_1 0 0 0 0
17 run 0 0 0 1
18 skip_2 0 0 0 0
19 cleanup_jump 0 0 1 0
20 skip_1 0 0 0 0
21 run 0 2 0 0
22 skip_2 0 0 0 0
23 run 0 1 0 0
24 skip_2 0 0 0 0
25 jump 2 0 0 0
26 skip_1 0 0 0 0
27 run 0 1 0 0
28 skip_2 0 0 0 0
29 jump 1 0 0 0
我添加了一些可能更好地解釋它的圖片:
任何有關如何對此進行編碼的幫助,或者甚至是解決此問題的其他方法的任何幫助都將受到高度贊賞。
謝謝!
這是使用 pandas 的解決方案:
import pandas as pd
import numpy as np
df['run'] = df['Tag'] == 'run'
val_mask = df['Tag'].replace({'cleanup_jump':'jump'}) == 'jump'
df['tag_id'] = (val_mask).cumsum()
df.loc[val_mask, 'Jump_Count'] = df.groupby('tag_id')['run'].sum().to_numpy()[:-1]
df.loc[df['run'], 'run_per_jump'] = df.loc[df['run']].groupby('tag_id')['run'].cumsum()
df['Jump_Run_Order'] = df.groupby('tag_id')['run_per_jump'].rank(method='dense', ascending=False)
jumps_idx = np.flatnonzero(df['Tag'] == 'jump')
cj_idxs = np.flatnonzero(df['Tag'] == 'cleanup_jump')
cj_help_idxs = np.asarray([np.max(jumps_idx[jumps_idx < cj_idx]) for cj_idx in cj_idxs])
for start, end in zip(cj_help_idxs+1, cj_idxs):
df.loc[start:end, 'Cleanup_Jump_Count'] = df.loc[start:end, 'Jump_Count']
df.loc[start:end, 'Cleanup_Jump_Run_Order'] = df.loc[start:end, 'Jump_Run_Order']
df.loc[start:end, 'Jump_Run_Order'] = 0
df.loc[start:end, 'Jump_Count'] = 0
df = df.drop(columns=['tag_id', 'run', 'run_per_jump']).fillna(0).convert_dtypes(convert_integer=True)
print(df)
Tag Jump_Count Jump_Run_Order Cleanup_Jump_Run_Order Cleanup_Jump_Count
0 skip_1 0 0 0 0
1 run 0 5 0 0
2 skip_1 0 0 0 0
3 run 0 4 0 0
4 skip_1 0 0 0 0
5 run 0 3 0 0
6 skip_2 0 0 0 0
7 run 0 2 0 0
8 skip_1 0 0 0 0
9 run 0 1 0 0
10 skip_2 0 0 0 0
11 jump 5 0 0 0
12 skip_1 0 0 0 0
13 run 0 1 0 0
14 skip_2 0 0 0 0
15 jump 1 0 0 0
16 skip_1 0 0 0 0
17 run 0 0 1 0
18 skip_2 0 0 0 0
19 cleanup_jump 0 0 0 1
20 skip_1 0 0 0 0
21 run 0 2 0 0
22 skip_2 0 0 0 0
23 run 0 1 0 0
24 skip_2 0 0 0 0
25 jump 2 0 0 0
26 skip_1 0 0 0 0
27 run 0 1 0 0
28 skip_2 0 0 0 0
29 jump 1 0 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.