简体   繁体   中英

R: show matched special character in a string

How can I show which special character was a match in each row of the single column dataframe?

Sample dataframe:

a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))

determining if the string has a special character:

a$name_cleansed <- gsub("([-./&,])|[[:punct:]]","\\1",a$name) #\\1 puts back the exception we define (dash and slash)

a <- a %>% mutate(has_special_char=if_else(name==name_cleansed,FALSE,TRUE))

在此处输入图像描述

You can use str_extract if we want only first special character.

library(stringr)
str_extract(a$name,'[[:punct:]]')
#[1] NA  "'" "_" NA  "%" "_" "@"

If we need all of the special characters we can use str_extract_all .

sapply(str_extract_all(a$name,'[[:punct:]]'), function(x) toString(unique(x)))
#[1] ""     "'"    "_"    ""     "%"    "_, !" "@, !"

To exclude certain symbols, we can use

exclude_symbol <- c('-', '.', '/', '&', ',')

sapply(str_extract_all(a$name,'[[:punct:]]'), function(x) 
                       toString(setdiff(unique(x), exclude_symbol)))

We can use grepl here for a base R option:

a$has_special_char <- grepl("(?![-./&,])[[:punct:]]", a$name, perl=TRUE)
a$special_char <- ifelse(a$has_special_char, sub("^.*([[:punct:]]).*$", "\\1", a$name), NA)
a

       name has_special_char special_char
1       foo            FALSE         <NA>
2      bar'             TRUE            '
3    ip_sum             TRUE            _
4      four            FALSE         <NA>
5       %23             TRUE            %
6 2_planet!             TRUE            !
7    @abc!!             TRUE            !

Data:

a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))

The above logic returns, arbitrarily, the first symbol character, if present, in each name, otherwise returning NA . It reuses the has_special_char column to determine if a symbol occurs in the name already.

Edit:

If you want a column which shows all special characters, then use:

a$all_special_char <- ifelse(a$has_special_char, gsub("[^[:punct:]]+", "", a$name), NA)

Base R regex solution using (caret) not "^" operator:

gsub("(^[-./&,])|[^[:punct:]]", "", a$name)

Also if you want a data.frame returned:

within(a, {
  special_char <- gsub("(^[-./&,])|[^[:punct:]]", "", name); 
  has_special_char <- special_char != ""})

If you only want unique special characters per name as in @Ronak Shah's answer:

within(a, {
    special_char <- sapply(gsub("(^[-./&,])|[^[:punct:]]", "", a$name),
                           function(x){toString(unique(unlist(strsplit(x, ""))))});
    has_special_char <- special_char != ""
  }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM