简体   繁体   中英

R: combinatorial string replacement

I am on the lookout for a gsub based function which would enable me to do combinatorial string replacement, so that if I would have an arbitrary number of string replacement rules

replrules=list("<x>"=c(3,5),"<ALK>"=c("hept","oct","non"),"<END>"=c("ane","ene"))

and a target string

string="<x>-methyl<ALK><END>"

it would give me a dataframe with the final string name and the substitutions that were made as in

name                x        ALK     END
3-methylheptane     3        hept    ane
5-methylheptane     5        hept    ane
3-methyloctane      3        oct     ane
5-methyloctane      5        ...     ...
3-methylnonane      3
5-methylnonane      5
3-methylheptene     3
5-methylheptene     5
3-methyloctene      3
5-methyloctene      5
3-methylnonene      3
5-methylnonene      5

The target string would be of arbitrary structure, eg it could also be string="1-<ALK>anol" or each pattern could occur several times, as in string="<ALK>anedioic acid, di<ALK>yl ester"

What would be the most elegant way to do this kind of thing in R?

How about

d <- do.call(expand.grid, replrules)

d$name <- paste0(d$'<x>', "-", "methyl", d$'<ALK>', d$'<END>')


EDIT

This seems to work (substituting each of these into the strplit )

string = "<x>-methyl<ALK><END>"
string2 = "<x>-ethyl<ALK>acosane"
string3 = "1-<ALK>anol"

Using Richards regex

d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))

s <- strsplit(string3, "(<|>)", perl = TRUE)[[1]]

out <- list()

for(i in s) {
  out[[i]] <- ifelse (i %in% names(d), d[i], i)
}

d$name <- do.call(paste0,  unlist(out, recursive=F))


EDIT

This should work for repeat items

d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))

string4 = "<x>-methyl<ALK><END>oate<ALK>"

s <- strsplit(string4, "(<|>)", perl = TRUE)[[1]]
out <- list()
for(i in seq_along(s)) {
  out[[i]] <- ifelse (s[i] %in% names(d), d[s[i]], s[i])
}
d$name <- do.call(paste0,  unlist(out, recursive=F))

Well, I'm not exactly sure we can even produce a "correct" answer to your question, but hopefully this helps give you some ideas.

Okay, so in s , I just split the string where it might be of most importance. Then g gets the first value in each element of r . Then I constructed a data frame as an example. So then dat is a one row example of how it would look.

> (s <- strsplit(string, "(?<=l|\\>)", perl = TRUE)[[1]])
# [1] "<x>"     "-methyl" "<ALK>"   "<END>"  
> g <- sapply(replrules, "[", 1)
> dat <- data.frame(name = paste(append(g, s[2], after = 1), collapse = ""))
> dat[2:4] <- g
> names(dat)[2:4] <- sapply(strsplit(names(g), "<|>"), "[", -1)
> dat
#              name x  ALK END
# 1 3-methylheptane 3 hept ane

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM