I have a very similar task to the original poster in this thread: Create new column in dataframe based on partial string matching other column
however have 10 different conditions under TEST. There was a suggestion in the original thread how to code for >3 conditions however I wasn't able to understand how to apply it to my data.
I want to create a column called DISTANCE that extracts the distance from the test. So for any test that includes "0.10m" in the name, I want to be able to have "0-10m" in the distance column. If it is "0.20m" in the name, I want it to be "0-20m" in the DISTANCE column and so on.
PLAYER SEX TEST VALUE
Player 1 Female ICE_0.10m 2.100000
Player 1 Female ICE_0.20m 3.475000
Player 1 Female ICE_10.20m 1.375000
Player 1 Female ICE_20.30m 1.246000
Player 1 Female ICE_0.30m 4.721000
Player 1 Female ICE_Vel_0.10m 4.761905
Player 1 Female ICE_Vel_0.20m 5.755396
Player 1 Female ICE_Vel_10.20m 7.272727
Player 1 Female ICE_Vel_20.30m 8.025682
Player 1 Female ICE_Vel_0.30m 6.354586
Player 1 Female OFF_0.10m 1.983000
Player 1 Female OFF_0.20m 3.380000
Player 1 Female OFF_10.20m 1.397000
Player 1 Female OFF_20.30m 1.380000
Player 1 Female OFF_0.30m 4.760000
Player 1 Female OFF_Vel_0.10m 5.042864
Player 1 Female OFF_Vel_0.20m 5.917160
Player 1 Female OFF_Vel_10.20m 7.158196
Player 1 Female OFF_Vel_20.30m 7.246377
Player 1 Female OFF_Vel_0.30m 6.302521
I tried this but it didn't work:
SpeedLong$Distance <- ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m",
ifelse(grepl("0.20m", SpeedLong$Tag, ignore.case = T), "0-20m",
ifelse(grepl("0.30m", SpeedLong$Tag, ignore.case = T), "0-30m",
ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m", "20-30m"))
With that code I don't get an error message but it shows the code in the console finishing with a + sign I guess meaning the code is incomplete? I don't know if if else and grepl are the best ways to go about this so alternative suggestions are welcome!
Instead of nested ifelse
, a better option would be to extract the matching substring and change the .
to -
with a regex pattern. Here, we match characters ( .*
) until the _
, capture the first set of numbers ( [0-9]+
) as a group ( (...)
) followed by the dot ( \\\\.
- dot is a metacharacter that matches any character, so we escape ( \\\\
) it to get the literal value), followed by second set of numbers in another capture group, and in the replacement
use the backreference ( \\\\1
, \\\\2
) of the captured group
library(dplyr)
library(stringr)
df1 %>%
mutate(DISTANCE = str_replace(TEST, ".*_([0-9]+)\\.([0-9]+)", "\\1-\\2"))
# PLAYER SEX TEST VALUE DISTANCE
#1 Player 1 Female ICE_0.10m 2.100000 0-10m
#2 Player 1 Female ICE_0.20m 3.475000 0-20m
#3 Player 1 Female ICE_10.20m 1.375000 10-20m
#4 Player 1 Female ICE_20.30m 1.246000 20-30m
#5 Player 1 Female ICE_0.30m 4.721000 0-30m
#6 Player 1 Female ICE_Vel_0.10m 4.761905 0-10m
#7 Player 1 Female ICE_Vel_0.20m 5.755396 0-20m
#8 Player 1 Female ICE_Vel_10.20m 7.272727 10-20m
#9 Player 1 Female ICE_Vel_20.30m 8.025682 20-30m
#10 Player 1 Female ICE_Vel_0.30m 6.354586 0-30m
#11 Player 1 Female OFF_0.10m 1.983000 0-10m
#12 Player 1 Female OFF_0.20m 3.380000 0-20m
#13 Player 1 Female OFF_10.20m 1.397000 10-20m
#14 Player 1 Female OFF_20.30m 1.380000 20-30m
#15 Player 1 Female OFF_0.30m 4.760000 0-30m
#16 Player 1 Female OFF_Vel_0.10m 5.042864 0-10m
#17 Player 1 Female OFF_Vel_0.20m 5.917160 0-20m
#18 Player 1 Female OFF_Vel_10.20m 7.158196 10-20m
#19 Player 1 Female OFF_Vel_20.30m 7.246377 20-30m
#20 Player 1 Female OFF_Vel_0.30m 6.302521 0-30m
Or using base R
df1$DISTANCE <- sub(".*_([0-9]+)\\.([0-9]+)", "\\1-\\2", df1$TEST)
df1 <- structure(list(PLAYER = c("Player 1", "Player 1", "Player 1",
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1",
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1",
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1"),
SEX = c("Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female"), TEST = c("ICE_0.10m", "ICE_0.20m",
"ICE_10.20m", "ICE_20.30m", "ICE_0.30m", "ICE_Vel_0.10m",
"ICE_Vel_0.20m", "ICE_Vel_10.20m", "ICE_Vel_20.30m", "ICE_Vel_0.30m",
"OFF_0.10m", "OFF_0.20m", "OFF_10.20m", "OFF_20.30m", "OFF_0.30m",
"OFF_Vel_0.10m", "OFF_Vel_0.20m", "OFF_Vel_10.20m", "OFF_Vel_20.30m",
"OFF_Vel_0.30m"), VALUE = c(2.1, 3.475, 1.375, 1.246, 4.721,
4.761905, 5.755396, 7.272727, 8.025682, 6.354586, 1.983,
3.38, 1.397, 1.38, 4.76, 5.042864, 5.91716, 7.158196, 7.246377,
6.302521)), class = "data.frame", row.names = c(NA, -20L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.