简体   繁体   中英

Replacing nth instance of a character string using sub/gsub in R

I am attempting to re-name some character strings given to me in a large list. The issue is that I only need to replace some of the characters not all of them.

exdata <- c("i_am_having_trouble_with_this_string",
            "i_am_wishing_files_were_cleaner_for_me",
            "any_help_would_be_greatly_appreciated")

From this list, for example, I would like to replace the third through the fifth instance of "_" with "-". I am having trouble understanding the regex coding for this, as most examples split strings up instead of keeping them intact.

Here are some alternative approaches. All of them can be generalized to arbitrary bounds by replacing 3 and 5 with other numbers.

1) strsplit Split the strings at underscore and use paste to collapse it back using the appropriate separators. No packages are used.

i <- 3
j <- 5
sapply(strsplit(exdata, "_"), function(x) {
  g <- seq_along(x)
  g[g < i] <- i
  g[g > j + 1] <- j+1
  paste(tapply(x, g, paste, collapse = "_"), collapse = "-")
})

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 

2) for loop This translates the first j occurrences of old to new in x and then translates the first i-1 occurrences of new back to old . No packages are used.

translate <- function(old, new, x, i = 1, j) {
 if (i <= 1) {
    if (j > 0) for(k in seq_len(j)) x <- sub(old, new, x, fixed = TRUE)
    x
 } else Recall(new, old, Recall(old, new, x, 1, j), 1, i-1)
}

translate("_", "-", exdata, 3, 5)

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 

3) gsubfn This uses a package but in return is substantially shorter than the others. gsubfn is like gsub except that the replacement string in gsub can be a string, list, function or proto object. In the case of a proto object the fun method of the proto object is invoked each time there is a match to the regular expression. Below the matching string is passed to fun as x while the output of fun replaces the match in the data. The proto object is automatically populated with a number of variables set by gsubfn and accessible by fun including count which is 1 for the first match, 2 for the second and so on. For more information see the gsubfn vignette -- section 4 discusses the use of proto objects.

library(gsubfn)

p <- proto(i = 3, j = 5, 
      fun = function(this, x) if (count >= i && count <= j) "-" else x)
gsubfn("_", p, exdata)

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 
> gsub('(.*_.*_.*?)_(.*?)_(.*?)_(.*)','\\1-\\2-\\3-\\4', exdata)
[1] "i_am_having-trouble-with-this_string"   "i_am_wishing-files-were-cleaner_for_me" "any_help_would-be-greatly-appreciated"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM