簡體   English   中英

R正則表達式可替換除句子標記,撇號和連字符以外的所有標點符號

[英]R regex to replace all punctuation except sentence markers, apostrophes and hyphens

我正在尋找一種在R中標記句子開頭和結尾的方法。為此,我想消除所有句子標點符號,例如句號,感嘆號,詢問符和連字符,這是我想用標記***代替。 同時,我還想保留包含撇號的單詞。 給一個具體的例子,給出以下字符串:

txt <- "We have examined all the possibilities, however we have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"

理想的結果是

txt <- "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"

我還沒有出來一個正則表達式來做到這一點。 任何提示,不勝感激。

您可以使用gsub。

> txt <- "We have examined all the possibilities, however he have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"
> gsub("[-.?!]", "<S>", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
[1] "We have examined all the possibilities however he have not reached a solid conclusion <S> however we keep and open mind<S> Have you considered any other approach<S> Haven't you<S>"
> gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
[1] "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"

除了句末標記,例如句號,感嘆號,審問標記和連字符以外,我想消除所有標點符號。

gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T)

我想用標記***代替。 同時,我還想保留包含撇號的單詞。

gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))

您可以通過使用兩個正則表達式來做到這一點。 首先,您可以使用字符類來刪除不需要的字符:

[,.]
  ^--- Whatever you want to remove, put it here

並使用空的替換字符串。

然后,您可以使用第二個正則表達式,如下所示:

[?!-]
  ^--- Add characters you want to replace here

用替換字符串:

<S>

工作演示

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM