简体   繁体   中英

Substitute captured non-ascii letter with upper case

Is it possible to replicate, using only regex and only base R (only using the g*sub() functions), the following...

sub("(i)", "\\U\\1", "string", perl = TRUE)
# [1] "strIng"

For non-ascii letters?

# Hoped for output
sub("(í)", "?", "stríng", perl = TRUE)
# [1] "strÍng"

PS. R regex flavours are TRE and PCRE.

PS2. I'm using R 4.2.1 with Sys.getlocale() giving:

[1] "LC_COLLATE=Icelandic_Iceland.utf8;LC_CTYPE=Icelandic_Iceland.utf8;LC_MONETARY=Icelandic_Iceland.utf8;LC_NUMERIC=C;LC_TIME=Icelandic_Iceland.utf8"

For a slightly more involved/explicit solution that only uses base R:

sub_nascii <- function(pattern, string) {
  matches <- gregexpr(pattern, string)[[1]]
  
  for (i in matches) {
    substr(string, i, i) <- toupper(substr(string, i, i))
  }
  string
}

sub_nascii(pattern = "í", "stríng")

This works in my locale where sub on it's own doesn't.

You can use

x="stríng"
gr <- gregexpr("í", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, toupper)
# > x
# > [1] "strÍng"

See the R demo online .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM