简体   繁体   中英

Create New Column in R - Extract regular characters from other column

I have a very similar task to the original poster in this thread: Create new column in dataframe based on partial string matching other column

however have 10 different conditions under TEST. There was a suggestion in the original thread how to code for >3 conditions however I wasn't able to understand how to apply it to my data.

I want to create a column called DISTANCE that extracts the distance from the test. So for any test that includes "0.10m" in the name, I want to be able to have "0-10m" in the distance column. If it is "0.20m" in the name, I want it to be "0-20m" in the DISTANCE column and so on.

PLAYER      SEX     TEST        VALUE             
Player 1    Female    ICE_0.10m    2.100000
Player 1    Female    ICE_0.20m    3.475000
Player 1    Female    ICE_10.20m    1.375000
Player 1    Female    ICE_20.30m    1.246000
Player 1    Female    ICE_0.30m    4.721000
Player 1    Female    ICE_Vel_0.10m    4.761905
Player 1    Female    ICE_Vel_0.20m    5.755396
Player 1    Female    ICE_Vel_10.20m    7.272727
Player 1    Female    ICE_Vel_20.30m    8.025682
Player 1    Female    ICE_Vel_0.30m    6.354586
Player 1    Female    OFF_0.10m    1.983000
Player 1    Female    OFF_0.20m    3.380000
Player 1    Female    OFF_10.20m    1.397000
Player 1    Female    OFF_20.30m    1.380000
Player 1    Female    OFF_0.30m    4.760000
Player 1    Female    OFF_Vel_0.10m    5.042864
Player 1    Female    OFF_Vel_0.20m    5.917160
Player 1    Female    OFF_Vel_10.20m    7.158196
Player 1    Female    OFF_Vel_20.30m    7.246377
Player 1    Female    OFF_Vel_0.30m    6.302521

I tried this but it didn't work:

SpeedLong$Distance <- ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m",
ifelse(grepl("0.20m", SpeedLong$Tag, ignore.case = T), "0-20m",
ifelse(grepl("0.30m", SpeedLong$Tag, ignore.case = T), "0-30m",
ifelse(grepl("0.10m", SpeedLong$Tag, ignore.case = T), "0-10m", "20-30m"))

With that code I don't get an error message but it shows the code in the console finishing with a + sign I guess meaning the code is incomplete? I don't know if if else and grepl are the best ways to go about this so alternative suggestions are welcome!

Instead of nested ifelse , a better option would be to extract the matching substring and change the . to - with a regex pattern. Here, we match characters ( .* ) until the _ , capture the first set of numbers ( [0-9]+ ) as a group ( (...) ) followed by the dot ( \\\\. - dot is a metacharacter that matches any character, so we escape ( \\\\ ) it to get the literal value), followed by second set of numbers in another capture group, and in the replacement use the backreference ( \\\\1 , \\\\2 ) of the captured group

library(dplyr)
library(stringr)
df1 %>% 
    mutate(DISTANCE = str_replace(TEST, ".*_([0-9]+)\\.([0-9]+)", "\\1-\\2"))
#     PLAYER    SEX           TEST    VALUE DISTANCE
#1  Player 1 Female      ICE_0.10m 2.100000    0-10m
#2  Player 1 Female      ICE_0.20m 3.475000    0-20m
#3  Player 1 Female     ICE_10.20m 1.375000   10-20m
#4  Player 1 Female     ICE_20.30m 1.246000   20-30m
#5  Player 1 Female      ICE_0.30m 4.721000    0-30m
#6  Player 1 Female  ICE_Vel_0.10m 4.761905    0-10m
#7  Player 1 Female  ICE_Vel_0.20m 5.755396    0-20m
#8  Player 1 Female ICE_Vel_10.20m 7.272727   10-20m
#9  Player 1 Female ICE_Vel_20.30m 8.025682   20-30m
#10 Player 1 Female  ICE_Vel_0.30m 6.354586    0-30m
#11 Player 1 Female      OFF_0.10m 1.983000    0-10m
#12 Player 1 Female      OFF_0.20m 3.380000    0-20m
#13 Player 1 Female     OFF_10.20m 1.397000   10-20m
#14 Player 1 Female     OFF_20.30m 1.380000   20-30m
#15 Player 1 Female      OFF_0.30m 4.760000    0-30m
#16 Player 1 Female  OFF_Vel_0.10m 5.042864    0-10m
#17 Player 1 Female  OFF_Vel_0.20m 5.917160    0-20m
#18 Player 1 Female OFF_Vel_10.20m 7.158196   10-20m
#19 Player 1 Female OFF_Vel_20.30m 7.246377   20-30m
#20 Player 1 Female  OFF_Vel_0.30m 6.302521    0-30m

Or using base R

df1$DISTANCE <- sub(".*_([0-9]+)\\.([0-9]+)", "\\1-\\2", df1$TEST)

data

df1 <- structure(list(PLAYER = c("Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1", "Player 1", 
"Player 1", "Player 1", "Player 1", "Player 1", "Player 1"), 
    SEX = c("Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female"), TEST = c("ICE_0.10m", "ICE_0.20m", 
    "ICE_10.20m", "ICE_20.30m", "ICE_0.30m", "ICE_Vel_0.10m", 
    "ICE_Vel_0.20m", "ICE_Vel_10.20m", "ICE_Vel_20.30m", "ICE_Vel_0.30m", 
    "OFF_0.10m", "OFF_0.20m", "OFF_10.20m", "OFF_20.30m", "OFF_0.30m", 
    "OFF_Vel_0.10m", "OFF_Vel_0.20m", "OFF_Vel_10.20m", "OFF_Vel_20.30m", 
    "OFF_Vel_0.30m"), VALUE = c(2.1, 3.475, 1.375, 1.246, 4.721, 
    4.761905, 5.755396, 7.272727, 8.025682, 6.354586, 1.983, 
    3.38, 1.397, 1.38, 4.76, 5.042864, 5.91716, 7.158196, 7.246377, 
    6.302521)), class = "data.frame", row.names = c(NA, -20L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM