I have a dataset as below:
structure(AI_decs)
Horse Time RaceID dyLTO Value.LTO Draw.IV
1 Warne's Army 06/04/2021 13:00 1 56 3429 0.88
2 G For Gabrial 06/04/2021 13:00 1 57 3299 1.15
3 First Charge 06/04/2021 13:00 1 66 3429 1.06
4 Dream With Me 06/04/2021 13:00 1 62 2862 0.97
5 Qawamees 06/04/2021 13:00 1 61 4690 0.97
6 Glan Y Gors 06/04/2021 13:00 1 59 3429 1.50
7 The Dancing Poet 06/04/2021 13:00 1 42 4690 1.41
8 Finoah 06/04/2021 13:00 1 59 10260 0.97
9 Ravenscar 06/04/2021 13:30 2 58 5208 0.65
10 Arabescato 06/04/2021 13:30 2 57 2862 1.09
11 Thai Terrier 06/04/2021 13:30 2 58 7439 1.30
12 The Rutland Rebel 06/04/2021 13:30 2 55 3429 2.17
13 Red Tornado 06/04/2021 13:30 2 49 3340 0.43
14 Alfredo 06/04/2021 13:30 2 54 5208 1.30
15 Tynecastle Park 06/04/2021 13:30 2 72 7439 0.87
16 Waldkonig 06/04/2021 14:00 3 55 3493 1.35
17 Kaleidoscopic 06/04/2021 14:00 3 68 7439 1.64
18 Louganini 06/04/2021 14:00 3 75 56025 1.26
I have a list of columns with performance data values for horses in a race. My dataset has many more rows and it contains a number of horse races on a given day. Each horse race has a unique time and a different number of horses in each race.
Basically, I want to assign a raceId (index number) to each individual race.
I am currently having to do this in excel (see column RaceID) by comparing the Time column and adding 1 to the RaceId value every time we encounter a new race. This has to be done manually each day before I import into R.
I hope there is a way to do this in R Dplyr. I thought if I use Group_by 'Time' there might be a function a bit like n() or row_number() that would index the races for me.
Perhaps using Case_when and lag/lead.
Thanks in advance for any help. Graham
Try this:
Note: group_indices()
was deprecated in dplyr 1.0.0.
library(dplyr)
df <- data.frame(time = rep(c("06/04/2021 13:00", "06/04/2021 13:30", "06/04/2021 14:00", "07/04/2021 14:00"), each = 3))
df %>%
group_by(time) %>%
mutate(race_id = cur_group_id())
#> # A tibble: 12 x 2
#> # Groups: time [4]
#> time race_id
#> <chr> <int>
#> 1 06/04/2021 13:00 1
#> 2 06/04/2021 13:00 1
#> 3 06/04/2021 13:00 1
#> 4 06/04/2021 13:30 2
#> 5 06/04/2021 13:30 2
#> 6 06/04/2021 13:30 2
#> 7 06/04/2021 14:00 3
#> 8 06/04/2021 14:00 3
#> 9 06/04/2021 14:00 3
#> 10 07/04/2021 14:00 4
#> 11 07/04/2021 14:00 4
#> 12 07/04/2021 14:00 4
Created on 2021-04-10 by the reprex package (v2.0.0)
You can group by data.table
's function rleid
(ie, run length ID):
library(dplyr)
library(data.table)
df %>%
group_by(race_id = rleid(time))
# A tibble: 12 x 2
# Groups: race_id [4]
time race_id
<chr> <int>
1 06/04/2021 13:00 1
2 06/04/2021 13:00 1
3 06/04/2021 13:00 1
4 06/04/2021 13:30 2
5 06/04/2021 13:30 2
6 06/04/2021 13:30 2
7 06/04/2021 14:00 3
8 06/04/2021 14:00 3
9 06/04/2021 14:00 3
10 07/04/2021 14:00 4
11 07/04/2021 14:00 4
12 07/04/2021 14:00 4
Data, from @Peter:
df <- data.frame(time = rep(c("06/04/2021 13:00", "06/04/2021 13:30", "06/04/2021 14:00", "07/04/2021 14:00"), each = 3))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.