簡體   English   中英

str將字符串拆分為字符,但保留圖表

[英]strsplit a string into characters, but keeping diagraphs

如何將英文單詞拆分成字符但保持連字符不變(例如“ch”、“th”、“gh”)?

例如,對於字符串“that”,我想將其拆分為“th”、“a”、“t”,而不是“t”、“h”、“a”、“t”。

這是一個可能有助於拆分的 function f

dg <- c("ch", "th", "gh", "ai")
v <- c("thanks", "chain", "banana", "that", "rain")

f <- Vectorize(function(s) {
  res <- c()
  while (nchar(s)) {
    k <- ifelse(substr(s, 1, 2) %in% dg, 2, 1)
    res <- c(res, substr(s, 1, k))
    s <- substr(s, k + 1, nchar(s))
  }
  res
})

你會看到

> f(v)
$thanks
[1] "th" "a"  "n"  "k"  "s" 

$chain
[1] "ch" "ai" "n"

$banana
[1] "b" "a" "n" "a" "n" "a"

$that
[1] "th" "a"  "t"

$rain
[1] "r"  "ai" "n"
strsplit("that", split = "(?<=t(?!h)|th|a)", perl = TRUE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM