简体   繁体   English

计算特定范围内的出现次数

[英]Count occurrences within a specific range

I have a data frame that looks like this:我有一个看起来像这样的数据框:

               Tag
0           skip_1
1              run
2           skip_1
3              run
4           skip_1
5              run
6           skip_2
7              run
8           skip_1
9              run
10          skip_2
11            jump
12          skip_1
13             run
14          skip_2
15            jump
16          skip_1
17             run
18          skip_2
19    cleanup_jump
20          skip_1
21             run
22          skip_2
23             run
24          skip_2
25            jump
26          skip_1
27             run
28          skip_2
29            jump

First, I would like to count the RUN occurrences between two JUMP events, then to enumerate this occurrences from the latest to the earliest within this range.首先,我想计算两个 JUMP 事件之间的 RUN 发生次数,然后在此范围内从最近到最早枚举此事件。 The expected results would be:预期结果将是:

             Tag  Jump_Run_Count  Run_Order
0         skip_1               0          0
1            run               0          5
2         skip_1               0          0
3            run               0          4
4         skip_1               0          0
5            run               0          3
6         skip_2               0          0
7            run               0          2
8         skip_1               0          0
9            run               0          1
10        skip_2               0          0
11          jump               5          0
12        skip_1               0          0
13           run               0          1
14        skip_2               0          0
15          jump               1          0
16        skip_1               0          0
17           run               0          0
18        skip_2               0          0
19  cleanup_jump               0          0
20        skip_1               0          0
21           run               0          2
22        skip_2               0          0
23           run               0          1
24        skip_2               0          0
25          jump               2          0
26        skip_1               0          0
27           run               0          1
28        skip_2               0          0
29          jump               1          0

One of the problems here is that the first RUN occurrences are not within 2 JUMP but are between the first JUMP and the beginning of the column.这里的一个问题是第一个 RUN 出现不在 2 JUMP 内,而是在第一个 JUMP 和列的开头之间。

Secondly I would like to do the same count and enumerate for a CLEANUP_JUMP and JUMP range, and store it in separate columns.其次,我想对 CLEANUP_JUMP 和 JUMP 范围进行相同的计数和枚举,并将其存储在单独的列中。

             Tag  Jump_Run_Count  Run_Order  Cleanup_Jump_Dig_Count  Run_Order2
0         skip_1               0          0                       0           0
1            run               0          5                       0           0
2         skip_1               0          0                       0           0
3            run               0          4                       0           0
4         skip_1               0          0                       0           0
5            run               0          3                       0           0
6         skip_2               0          0                       0           0
7            run               0          2                       0           0
8         skip_1               0          0                       0           0
9            run               0          1                       0           0
10        skip_2               0          0                       0           0
11          jump               5          0                       0           0
12        skip_1               0          0                       0           0
13           run               0          1                       0           0
14        skip_2               0          0                       0           0
15          jump               1          0                       0           0
16        skip_1               0          0                       0           0
17           run               0          0                       0           1
18        skip_2               0          0                       0           0
19  cleanup_jump               0          0                       1           0
20        skip_1               0          0                       0           0
21           run               0          2                       0           0
22        skip_2               0          0                       0           0
23           run               0          1                       0           0
24        skip_2               0          0                       0           0
25          jump               2          0                       0           0
26        skip_1               0          0                       0           0
27           run               0          1                       0           0
28        skip_2               0          0                       0           0
29          jump               1          0                       0           0

I have added some pictures that might explain it better:我添加了一些可能更好地解释它的图片:

Scenario1场景一

Scenario2场景2

Any help on how to code this, or even another way to approach this issue will be highly appreciated.任何有关如何对此进行编码的帮助,或者甚至是解决此问题的其他方法的任何帮助都将受到高度赞赏。

Thanks!谢谢!

Here is a solution using pandas:这是使用 pandas 的解决方案:

import pandas as pd
import numpy as np

df['run'] = df['Tag'] == 'run'
val_mask = df['Tag'].replace({'cleanup_jump':'jump'}) == 'jump'
df['tag_id'] = (val_mask).cumsum()
df.loc[val_mask, 'Jump_Count'] = df.groupby('tag_id')['run'].sum().to_numpy()[:-1]
df.loc[df['run'], 'run_per_jump'] = df.loc[df['run']].groupby('tag_id')['run'].cumsum()
df['Jump_Run_Order'] = df.groupby('tag_id')['run_per_jump'].rank(method='dense', ascending=False)

jumps_idx = np.flatnonzero(df['Tag'] == 'jump')
cj_idxs = np.flatnonzero(df['Tag'] == 'cleanup_jump')
cj_help_idxs = np.asarray([np.max(jumps_idx[jumps_idx < cj_idx]) for cj_idx in cj_idxs])

for start, end in zip(cj_help_idxs+1, cj_idxs):
    df.loc[start:end, 'Cleanup_Jump_Count'] = df.loc[start:end, 'Jump_Count']
    df.loc[start:end, 'Cleanup_Jump_Run_Order'] = df.loc[start:end, 'Jump_Run_Order']
    df.loc[start:end, 'Jump_Run_Order'] = 0
    df.loc[start:end, 'Jump_Count'] = 0

df = df.drop(columns=['tag_id', 'run', 'run_per_jump']).fillna(0).convert_dtypes(convert_integer=True)

print(df)
             Tag  Jump_Count  Jump_Run_Order  Cleanup_Jump_Run_Order  Cleanup_Jump_Count
0         skip_1           0               0                       0                   0
1            run           0               5                       0                   0
2         skip_1           0               0                       0                   0
3            run           0               4                       0                   0
4         skip_1           0               0                       0                   0
5            run           0               3                       0                   0
6         skip_2           0               0                       0                   0
7            run           0               2                       0                   0
8         skip_1           0               0                       0                   0
9            run           0               1                       0                   0
10        skip_2           0               0                       0                   0
11          jump           5               0                       0                   0
12        skip_1           0               0                       0                   0
13           run           0               1                       0                   0
14        skip_2           0               0                       0                   0
15          jump           1               0                       0                   0
16        skip_1           0               0                       0                   0
17           run           0               0                       1                   0
18        skip_2           0               0                       0                   0
19  cleanup_jump           0               0                       0                   1
20        skip_1           0               0                       0                   0
21           run           0               2                       0                   0
22        skip_2           0               0                       0                   0
23           run           0               1                       0                   0
24        skip_2           0               0                       0                   0
25          jump           2               0                       0                   0
26        skip_1           0               0                       0                   0
27           run           0               1                       0                   0
28        skip_2           0               0                       0                   0
29          jump           1               0                       0                   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM