简体   繁体   中英

gsub string before multiple symbols together in R

I am trying to gsub string before multiple symbols in a column together in R. Let me explain with an example

data <- data.frame(c("1_a-b","2: b-c","3_c-d"))
colnames(data) <- "ABC"

I want the final dataset to look like:

data <- data.frame(c("a-b","b-c","c-d"))
colnames(data) <- "ABC"

I am doing this:

if(any(grepl(":|_", data$ABC))){
      data$ABC <- gsub(".*_", "", data$ABC)
    } 

I tried using pipe "|" to add another condition like:

if(any(grepl(":|_", data$ABC))){
      data$ABC <- gsub(".*_"|".*:", "", data$ABC)
    } 

But it doesn't work. Is there a way to do it in a step. Also, I have to check if the column has these symbols or not, thus the grepl.

You may use the following regex if you need to remove up to the last _ or : :

sub(".*[_:]\\s*", "", data$ABC)

Or, if you need to remove up to the first _ or : :

sub(".*?[_:]\\s*", "", data$ABC)

Pattern details :

  • .*? - any 0+ chars, as few as possible ( .* matches 0 or more chars as many as possible)
  • [_:] - a _ or :
  • \\s* - 0+ whitespaces.

See the regex demo and an R demo :

data <- data.frame(c("1_a-b","2: b-c","3_c-d"))
colnames(data) <- "ABC"
if(any(grepl(":|_", data$ABC))){
   data$ABC <- sub(".*[_:]\\s*", "", data$ABC)
} 

Output of data :

  ABC
1 a-b
2 b-c
3 c-d

How about this, using the stringr package?

library(stringr)
df %>% 
  mutate(
    ABC = as.character(ABC),
    new = if_else(
      str_detect(ABC, "\\w\\-\\w"),
      str_extract(ABC, "\\w\\-\\w"),
      ABC
    )
  )

     ABC new
1  1_a-b a-b
2 2: b-c b-c
3  3_c-d c-d

Changed to include an if-else statement - missed that you're interested in checking for that sequence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM