![](/img/trans.png)
[英]R regex to replace all punctuation except sentence markers, apostrophes and hyphens
[英]Regex; eliminate all punctuation except
我有以下正則表達式可以在任何空格或標點符號上拆分。 如何從:punct:
排除 1 個或多個標點符號? 假設我想排除撇號和逗號。 我知道我可以明確使用[all punctuation marks in here]
而不是[[:punct:]]
但我希望有一種排除方法。
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)
[1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet"
[10] "," "" "but" "am" "getting" "better" "!"
我不清楚你想要的結果是什么,但你可以使用像這個答案這樣的否定類。
R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
如果右側的下一個字符是'
或,
則您可以直接使用(?![',])
負前瞻對 PCRE 子模式施加限制,該負前瞻會導致匹配失敗:
[[:space:]]|(?=(?![',])[[:punct:]])
^^^^^^^^
請參閱正則表達式演示。
細節
[[:space:]]
- 任何空格|
- 要么(?=(?![',])[[:punct:]])
- 一個正向前瞻,要求在當前位置的右側,沒有'
and ,
並且有任何 1 個標點字符那不是'
或,
(實際上,需要除'
和,
之外'
任何標點符號)。查看R 在線演示
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.