[英]Indexing a group in R with dplyr
I have a dataset as below:我有一个数据集如下:
structure(AI_decs)
结构(AI_decs)
Horse Time RaceID dyLTO Value.LTO Draw.IV
1 Warne's Army 06/04/2021 13:00 1 56 3429 0.88
2 G For Gabrial 06/04/2021 13:00 1 57 3299 1.15
3 First Charge 06/04/2021 13:00 1 66 3429 1.06
4 Dream With Me 06/04/2021 13:00 1 62 2862 0.97
5 Qawamees 06/04/2021 13:00 1 61 4690 0.97
6 Glan Y Gors 06/04/2021 13:00 1 59 3429 1.50
7 The Dancing Poet 06/04/2021 13:00 1 42 4690 1.41
8 Finoah 06/04/2021 13:00 1 59 10260 0.97
9 Ravenscar 06/04/2021 13:30 2 58 5208 0.65
10 Arabescato 06/04/2021 13:30 2 57 2862 1.09
11 Thai Terrier 06/04/2021 13:30 2 58 7439 1.30
12 The Rutland Rebel 06/04/2021 13:30 2 55 3429 2.17
13 Red Tornado 06/04/2021 13:30 2 49 3340 0.43
14 Alfredo 06/04/2021 13:30 2 54 5208 1.30
15 Tynecastle Park 06/04/2021 13:30 2 72 7439 0.87
16 Waldkonig 06/04/2021 14:00 3 55 3493 1.35
17 Kaleidoscopic 06/04/2021 14:00 3 68 7439 1.64
18 Louganini 06/04/2021 14:00 3 75 56025 1.26
I have a list of columns with performance data values for horses in a race.我有一个列列表,其中包含比赛中马匹的性能数据值。 My dataset has many more rows and it contains a number of horse races on a given day.
我的数据集有更多行,它包含给定日期的许多赛马。 Each horse race has a unique time and a different number of horses in each race.
每场赛马都有一个独特的时间和每场比赛中不同数量的马匹。
Basically, I want to assign a raceId (index number) to each individual race.基本上,我想为每个单独的比赛分配一个raceId(索引号)。
I am currently having to do this in excel (see column RaceID) by comparing the Time column and adding 1 to the RaceId value every time we encounter a new race.我目前必须在 excel 中执行此操作(请参见 RaceID 列),方法是每次遇到新比赛时比较 Time 列并将 RaceId 值加 1。 This has to be done manually each day before I import into R.
在我导入 R 之前,这必须每天手动完成。
I hope there is a way to do this in R Dplyr.我希望在 R Dplyr 中有一种方法可以做到这一点。 I thought if I use Group_by 'Time' there might be a function a bit like n() or row_number() that would index the races for me.
我想如果我使用 Group_by 'Time' 可能会有一个 function 有点像 n() 或 row_number() 可以为我索引比赛。
Perhaps using Case_when and lag/lead.也许使用 Case_when 和滞后/领先。
Thanks in advance for any help.提前感谢您的帮助。 Graham
格雷厄姆
Try this:尝试这个:
Note: group_indices()
was deprecated in dplyr 1.0.0.注意:
group_indices()
在 dplyr 1.0.0 中已弃用。
library(dplyr)
df <- data.frame(time = rep(c("06/04/2021 13:00", "06/04/2021 13:30", "06/04/2021 14:00", "07/04/2021 14:00"), each = 3))
df %>%
group_by(time) %>%
mutate(race_id = cur_group_id())
#> # A tibble: 12 x 2
#> # Groups: time [4]
#> time race_id
#> <chr> <int>
#> 1 06/04/2021 13:00 1
#> 2 06/04/2021 13:00 1
#> 3 06/04/2021 13:00 1
#> 4 06/04/2021 13:30 2
#> 5 06/04/2021 13:30 2
#> 6 06/04/2021 13:30 2
#> 7 06/04/2021 14:00 3
#> 8 06/04/2021 14:00 3
#> 9 06/04/2021 14:00 3
#> 10 07/04/2021 14:00 4
#> 11 07/04/2021 14:00 4
#> 12 07/04/2021 14:00 4
Created on 2021-04-10 by the reprex package (v2.0.0)由代表 package (v2.0.0) 于 2021 年 4 月 10 日创建
You can group by data.table
's function rleid
(ie, run length ID):您可以按
data.table
的 function rleid
(即运行长度 ID)分组:
library(dplyr)
library(data.table)
df %>%
group_by(race_id = rleid(time))
# A tibble: 12 x 2
# Groups: race_id [4]
time race_id
<chr> <int>
1 06/04/2021 13:00 1
2 06/04/2021 13:00 1
3 06/04/2021 13:00 1
4 06/04/2021 13:30 2
5 06/04/2021 13:30 2
6 06/04/2021 13:30 2
7 06/04/2021 14:00 3
8 06/04/2021 14:00 3
9 06/04/2021 14:00 3
10 07/04/2021 14:00 4
11 07/04/2021 14:00 4
12 07/04/2021 14:00 4
Data, from @Peter:数据,来自@Peter:
df <- data.frame(time = rep(c("06/04/2021 13:00", "06/04/2021 13:30", "06/04/2021 14:00", "07/04/2021 14:00"), each = 3))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.