简体   繁体   中英

how to remove ONLY a specific group of characters from both names and values of dataframe in R

assuming this is my df

df <- tibble(`a*`=c("_x__", "*y", "z+-"),
             b=c("_x__", "*y", "z+-"))
> df
# A tibble: 3 x 2
  `a*`  b    
  <chr> <chr>
1 _x__  _x__ 
2 *y    *y   
3 z+-   z+-  

I want to remove *, _ and + characters from both column names and values if exist to get

# A tibble: 3 x 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-  

so I am using gsub() , but it only removes the first character. in fact I am looking for a pretty way to achieve both these changes using dply r pipes. Any hint or idea is appreciated.

df %>%
  mutate_all(funs(gsub(c("_","[*]","+"),"",.))) 


names(df) <- str_remove_all("[*]")

We can pass multiple characters to match within [] in str_remove or gsub . But, not a vector of patterns in gsub as pattern is not vectorized in gsub

library(dplyr)
library(stringr)
df <- df %>% 
   transmute(across(everything(), str_remove_all,
    pattern = "[*_+]", .names = "{str_remove_all(.col, '[*_+]')}"))

-output

df
# A tibble: 3 × 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-   

This does the names as well but is pretty similar to akrun's answer:

library(dplyr)

pattern = "\\*|\\+|_"
df  |>
    mutate(across(
        .fns = \(col) gsub(pattern, "", col)
    ))  |>
    setNames(gsub(pattern, "", names(df)))
# A tibble: 3 x 2
#   a     b        
#   <chr> <chr>
# 1 x     x
# 2 y     y
# 3 z-    z-

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM