[英]strsplit a string into characters, but keeping diagraphs
如何將英文單詞拆分成字符但保持連字符不變(例如“ch”、“th”、“gh”)?
例如,對於字符串“that”,我想將其拆分為“th”、“a”、“t”,而不是“t”、“h”、“a”、“t”。
這是一個可能有助於拆分的 function f
dg <- c("ch", "th", "gh", "ai")
v <- c("thanks", "chain", "banana", "that", "rain")
f <- Vectorize(function(s) {
res <- c()
while (nchar(s)) {
k <- ifelse(substr(s, 1, 2) %in% dg, 2, 1)
res <- c(res, substr(s, 1, k))
s <- substr(s, k + 1, nchar(s))
}
res
})
你會看到
> f(v)
$thanks
[1] "th" "a" "n" "k" "s"
$chain
[1] "ch" "ai" "n"
$banana
[1] "b" "a" "n" "a" "n" "a"
$that
[1] "th" "a" "t"
$rain
[1] "r" "ai" "n"
strsplit("that", split = "(?<=t(?!h)|th|a)", perl = TRUE)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.