简体   繁体   中英

gsubfn | Replace text using variables in Substitution

I am trying to remove a block of text that wraps around what I want to keep. So I wanted to assign variables since the text can be long. This is an example of what I am trying to do. [Doesn't remove the text]

Text<-'This is an example text [] test' 
topheader<-'This'
bottomheader<-'test'


gsubfn(".", list(topheader = "", bottomheader = ""), Text)
[1] "This is an example text [] test"


Goal: "is an example text []" 

I think this is one solution to what you're looking for:

# Your data:
Text<-'This is an example text [] test' 
topheader<-'This'
bottomheader<-'test'

# A possible solution fn
gsubfn <- function(text, th, bh, th.replace="", bh.replace="") {
  answer <- gsub(text,
                 pattern=paste0(th," (.*) ",bh), 
                 replacement=paste0(th.replace,"\\1",bh.replace)
                 )
  return(answer)
  }

# Your req'd answer
gsubfn(text=Text,th=topheader,bh=bottomheader)

# Another example
gsubfn(text=Text,th=topheader,bh=bottomheader,th.replace="@@@ ",bh.replace=" ###")

You can just collapse your search words into a regex string.

Test <- 'This is an example text testing [] test'

top <- "This"
bottom <- "test"

arg <- c(top, bottom)
arg <- paste(arg, collapse="|")
arg <- gsub("(\\w+)", "\\\\b\\1\\\\b", arg)

Test.c <- gsub(arg, "", Test)
Test.c <- gsub("[ ]+", " ", Test.c)
Test.c <- gsub("^[[:space:]]|[[:space:]]$", "", Test.c)
Test.c
# "is an example text []"

Or using magrittr pipes

library(magrittr)

c(top, bottom) %>%
paste(collapse="|") %>%
gsub("(\\w+)", "\\\\b\\1\\\\b", .) %>%
gsub(., "", Test) %>%
gsub("[ ]+", " ", .) %>%
gsub("^[[:space:]]|[[:space:]]$", "", .) -> Test.c
Test.c
# "is an example text []"

Or using a loop

Test.c <- Test
words <- c(top, bottom)
for (i in words) {
    Test.c <- gsub(paste0("\\\\b", i, "\\\\b"), "", Test)
}
Test.c <- gsub("[ ]+", " ", Test.c)
Test.c <- gsub("^[[:space:]]|[[:space:]]$", "", Test.c)
Test.c
# "is an example text []"

1) gsubfn There are several problems here:

  • the regular expression in gsubfn (and in gsub ) must match the string you want to process but a dot matches only a single character so it can never match This or test which are 4 character strings. Use "\\\\w+" instead.

  • In list(a = x) the a must be a constant, not a variable. Write out the names explicitly or use setNames instead if they are in variables.

Thus to fix up the code in the question:

library(gsubfn)

trimws(gsubfn("\\w+", list(This = "", text = ""), Text))
## [1] "is an example  [] test"

or in terms of the header variables:

L <- setNames(list("", ""), c(topheader, bottomheader))
trimws(gsubfn("\\w+", L, Text))
## [1] "is an example  [] test"

Note that this will replace any occurrence of topheader and bottomheader and not just ones at the start and end; however, this seems to be the closest to your code that is likely sufficient.

2) sub Another possibility is this simple sub

sub("^This (.*) text$", "\\1", Text)
[1] "is an example  [] test"

or in terms of the header variables:

pat <- sprintf("^%s (.*) %s$", topheader, bottomheader)
sub(pat, "\\1", Text)
## [1] "is an example  [] test"

Update: Fixed (1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM