简体   繁体   中英

Selective replace string in R

I have a vector of strings. Most of the elements in the vector consist of one or more letters followed by one or more numbers. I wish to selectively replace only the words with "September" (and its abbreviations) in each string with "Sep" but retain the numbers. This is what I have tried out using stringr package

my.data <- c("01Sept2019", "05sep2019", "4September2019", "8sep2019",
              "12oct2019", "4Jun2018", "17Mar2017", "09May2015", "13Sep19")

library(stringr)    
my.data %>% str_replace_all("(?i)Sept?(ember)?[0-9]", "Sep") 
#>  [[1]]
#>    [1] "01Sep019", "05Sep019", "4Sep019", "8Sep019", "13Sep9"

This is what I would like to obtain:

#> [1] "01Sep2019", "05Sep2019", "4Sep2019", "8Sep2019", "13Sep19"

Can someone please help me out. Thanks

In base you can use sub with the pattern [Ss]ep[[:alpha:]]* to find September and its abbreviations and replace it with Sep .

sub("[Ss]ep[[:alpha:]]*", "Sep", my.data)
#[1] "01Sep2019" "05Sep2019" "4Sep2019"  "8Sep2019"  "12oct2019" "4Jun2018" 
#[7] "17Mar2017" "09May2015" "13Sep19"  

To match really only September followed by a number you can use:

sub("sep(t|(?=\\d))(e|(?=\\d))(m|(?=\\d))(b|(?=\\d))(e|(?=\\d))(r|(?=\\d))"
  , "Sep", my.data, ignore.case=TRUE, perl=TRUE)
#[1] "01Sep2019" "05Sep2019" "4Sep2019"  "8Sep2019"  "12oct2019" "4Jun2018" 
#[7] "17Mar2017" "09May2015" "13Sep19"  

An option with str_replace

library(stringr)
library(dplyr)
my.data %>% 
    str_replace("(?i)(Sep[^0-9]+)", "Sep")

in Base-R

grep("Sep|sep",my.data,value=T)

output

[1] "01Sept2019"     "05sep2019"      "4September2019" "8sep2019"      
[5] "13Sep19"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM