繁体   English   中英

计算特定范围内的出现次数

[英]Count occurrences within a specific range

我有一个看起来像这样的数据框:

               Tag
0           skip_1
1              run
2           skip_1
3              run
4           skip_1
5              run
6           skip_2
7              run
8           skip_1
9              run
10          skip_2
11            jump
12          skip_1
13             run
14          skip_2
15            jump
16          skip_1
17             run
18          skip_2
19    cleanup_jump
20          skip_1
21             run
22          skip_2
23             run
24          skip_2
25            jump
26          skip_1
27             run
28          skip_2
29            jump

首先,我想计算两个 JUMP 事件之间的 RUN 发生次数,然后在此范围内从最近到最早枚举此事件。 预期结果将是:

             Tag  Jump_Run_Count  Run_Order
0         skip_1               0          0
1            run               0          5
2         skip_1               0          0
3            run               0          4
4         skip_1               0          0
5            run               0          3
6         skip_2               0          0
7            run               0          2
8         skip_1               0          0
9            run               0          1
10        skip_2               0          0
11          jump               5          0
12        skip_1               0          0
13           run               0          1
14        skip_2               0          0
15          jump               1          0
16        skip_1               0          0
17           run               0          0
18        skip_2               0          0
19  cleanup_jump               0          0
20        skip_1               0          0
21           run               0          2
22        skip_2               0          0
23           run               0          1
24        skip_2               0          0
25          jump               2          0
26        skip_1               0          0
27           run               0          1
28        skip_2               0          0
29          jump               1          0

这里的一个问题是第一个 RUN 出现不在 2 JUMP 内,而是在第一个 JUMP 和列的开头之间。

其次,我想对 CLEANUP_JUMP 和 JUMP 范围进行相同的计数和枚举,并将其存储在单独的列中。

             Tag  Jump_Run_Count  Run_Order  Cleanup_Jump_Dig_Count  Run_Order2
0         skip_1               0          0                       0           0
1            run               0          5                       0           0
2         skip_1               0          0                       0           0
3            run               0          4                       0           0
4         skip_1               0          0                       0           0
5            run               0          3                       0           0
6         skip_2               0          0                       0           0
7            run               0          2                       0           0
8         skip_1               0          0                       0           0
9            run               0          1                       0           0
10        skip_2               0          0                       0           0
11          jump               5          0                       0           0
12        skip_1               0          0                       0           0
13           run               0          1                       0           0
14        skip_2               0          0                       0           0
15          jump               1          0                       0           0
16        skip_1               0          0                       0           0
17           run               0          0                       0           1
18        skip_2               0          0                       0           0
19  cleanup_jump               0          0                       1           0
20        skip_1               0          0                       0           0
21           run               0          2                       0           0
22        skip_2               0          0                       0           0
23           run               0          1                       0           0
24        skip_2               0          0                       0           0
25          jump               2          0                       0           0
26        skip_1               0          0                       0           0
27           run               0          1                       0           0
28        skip_2               0          0                       0           0
29          jump               1          0                       0           0

我添加了一些可能更好地解释它的图片:

场景一

场景2

任何有关如何对此进行编码的帮助,或者甚至是解决此问题的其他方法的任何帮助都将受到高度赞赏。

谢谢!

这是使用 pandas 的解决方案:

import pandas as pd
import numpy as np

df['run'] = df['Tag'] == 'run'
val_mask = df['Tag'].replace({'cleanup_jump':'jump'}) == 'jump'
df['tag_id'] = (val_mask).cumsum()
df.loc[val_mask, 'Jump_Count'] = df.groupby('tag_id')['run'].sum().to_numpy()[:-1]
df.loc[df['run'], 'run_per_jump'] = df.loc[df['run']].groupby('tag_id')['run'].cumsum()
df['Jump_Run_Order'] = df.groupby('tag_id')['run_per_jump'].rank(method='dense', ascending=False)

jumps_idx = np.flatnonzero(df['Tag'] == 'jump')
cj_idxs = np.flatnonzero(df['Tag'] == 'cleanup_jump')
cj_help_idxs = np.asarray([np.max(jumps_idx[jumps_idx < cj_idx]) for cj_idx in cj_idxs])

for start, end in zip(cj_help_idxs+1, cj_idxs):
    df.loc[start:end, 'Cleanup_Jump_Count'] = df.loc[start:end, 'Jump_Count']
    df.loc[start:end, 'Cleanup_Jump_Run_Order'] = df.loc[start:end, 'Jump_Run_Order']
    df.loc[start:end, 'Jump_Run_Order'] = 0
    df.loc[start:end, 'Jump_Count'] = 0

df = df.drop(columns=['tag_id', 'run', 'run_per_jump']).fillna(0).convert_dtypes(convert_integer=True)

print(df)
             Tag  Jump_Count  Jump_Run_Order  Cleanup_Jump_Run_Order  Cleanup_Jump_Count
0         skip_1           0               0                       0                   0
1            run           0               5                       0                   0
2         skip_1           0               0                       0                   0
3            run           0               4                       0                   0
4         skip_1           0               0                       0                   0
5            run           0               3                       0                   0
6         skip_2           0               0                       0                   0
7            run           0               2                       0                   0
8         skip_1           0               0                       0                   0
9            run           0               1                       0                   0
10        skip_2           0               0                       0                   0
11          jump           5               0                       0                   0
12        skip_1           0               0                       0                   0
13           run           0               1                       0                   0
14        skip_2           0               0                       0                   0
15          jump           1               0                       0                   0
16        skip_1           0               0                       0                   0
17           run           0               0                       1                   0
18        skip_2           0               0                       0                   0
19  cleanup_jump           0               0                       0                   1
20        skip_1           0               0                       0                   0
21           run           0               2                       0                   0
22        skip_2           0               0                       0                   0
23           run           0               1                       0                   0
24        skip_2           0               0                       0                   0
25          jump           2               0                       0                   0
26        skip_1           0               0                       0                   0
27           run           0               1                       0                   0
28        skip_2           0               0                       0                   0
29          jump           1               0                       0                   0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM