I am trying to gsub string before multiple symbols in a column together in R. Let me explain with an example
data <- data.frame(c("1_a-b","2: b-c","3_c-d"))
colnames(data) <- "ABC"
I want the final dataset to look like:
data <- data.frame(c("a-b","b-c","c-d"))
colnames(data) <- "ABC"
I am doing this:
if(any(grepl(":|_", data$ABC))){
data$ABC <- gsub(".*_", "", data$ABC)
}
I tried using pipe "|" to add another condition like:
if(any(grepl(":|_", data$ABC))){
data$ABC <- gsub(".*_"|".*:", "", data$ABC)
}
But it doesn't work. Is there a way to do it in a step. Also, I have to check if the column has these symbols or not, thus the grepl.
You may use the following regex if you need to remove up to the last _
or :
:
sub(".*[_:]\\s*", "", data$ABC)
Or, if you need to remove up to the first _
or :
:
sub(".*?[_:]\\s*", "", data$ABC)
Pattern details :
.*?
- any 0+ chars, as few as possible ( .*
matches 0 or more chars as many as possible) [_:]
- a _
or :
\\s*
- 0+ whitespaces. See the regex demo and an R demo :
data <- data.frame(c("1_a-b","2: b-c","3_c-d"))
colnames(data) <- "ABC"
if(any(grepl(":|_", data$ABC))){
data$ABC <- sub(".*[_:]\\s*", "", data$ABC)
}
Output of data
:
ABC
1 a-b
2 b-c
3 c-d
How about this, using the stringr
package?
library(stringr)
df %>%
mutate(
ABC = as.character(ABC),
new = if_else(
str_detect(ABC, "\\w\\-\\w"),
str_extract(ABC, "\\w\\-\\w"),
ABC
)
)
ABC new
1 1_a-b a-b
2 2: b-c b-c
3 3_c-d c-d
Changed to include an if-else statement - missed that you're interested in checking for that sequence.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.