简体   繁体   中英

Remove special characters and numbers from column R

I have a df which looks like this -

df <- data.frame(c = c('X.Int.2', 'BI', 'X.Int..4', 'BI.4', 'X.Int.6'),
                 d = sample(1:5, replace=T))

I am trying to remove all special characters, the 'X' and the numbers from col d.

I have tried

df %>%
  mutate(c = gsub("\\s[0-9()]+", '', c))

and

df %>%
    mutate(c = str_extract_all(c, "field:[a-zA-Z]+"))

Neither throw up an errors, but the first doesn't change the df and the second empties the column.

I'm clearly missing something obvious.

I'm hoping for -

c<-c('Int', "BI', 'Int', 'BI', 'Int')

In base R, you can try with gsub :

gsub('[X.0-9]', '', df$c)
#> [1] "Int" "BI"  "Int" "BI"  "Int"

This removes character "X" , "." and numbers from c column.

Remove X. and digits

str_remove_all(df$c, "[X.]|[:digit:]")
#> [1] "Int" "BI"  "Int" "BI"  "Int"

inside mutate:

df %>% 
  mutate(c = str_remove_all(c, "[X.]|[:digit:]"))
#>     c d
#> 1 Int 4
#> 2  BI 1
#> 3 Int 2
#> 4  BI 3
#> 5 Int 5

Another option with gsub

gsub("[X.\\d+]", "", df$c, perl=TRUE)
#[1] "Int" "BI"  "Int" "BI"  "Int"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM