簡體   English   中英

正則表達式; 刪除所有標點符號,除了

[英]Regex; eliminate all punctuation except

我有以下正則表達式可以在任何空格或標點符號上拆分。 如何從:punct:排除 1 個或多個標點符號? 假設我想排除撇號和逗號。 我知道我可以明確使用[all punctuation marks in here]而不是[[:punct:]]但我希望有一種排除方法。

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)

 [1] "I"       "'"       "m"       "not"     "that"    "good"    "at"      "regex"   "yet"    
[10] ","       ""        "but"     "am"      "getting" "better"  "!"

我不清楚你想要的結果是什么,但你可以使用像這個答案這樣的否定類。

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"    

如果右側的下一個字符是',則您可以直接使用(?![',])負前瞻對 PCRE 子模式施加限制,該負前瞻會導致匹配失敗:

[[:space:]]|(?=(?![',])[[:punct:]])
               ^^^^^^^^ 

請參閱正則表達式演示

細節

  • [[:space:]] - 任何空格
  • | - 要么
  • (?=(?![',])[[:punct:]]) - 一個正向前瞻,要求在當前位置的右側,沒有' and ,並且有任何 1 個標點字符那不是', (實際上,需要除',之外'任何標點符號)。

查看R 在線演示

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM