![](/img/trans.png)
[英]Removing all punctuation apart from single apostrophes and hyphens within words
[英]R regex to replace all punctuation except sentence markers, apostrophes and hyphens
我正在尋找一種在R中標記句子開頭和結尾的方法。為此,我想消除所有句子標點符號,例如句號,感嘆號,詢問符和連字符,這是我想用標記***代替。 同時,我還想保留包含撇號的單詞。 給一個具體的例子,給出以下字符串:
txt <- "We have examined all the possibilities, however we have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"
理想的結果是
txt <- "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"
我還沒有出來一個正則表達式來做到這一點。 任何提示,不勝感激。
您可以使用gsub。
> txt <- "We have examined all the possibilities, however he have not reached a solid conclusion - however we keep and open mind! Have you considered any other approach? Haven't you?"
> gsub("[-.?!]", "<S>", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
[1] "We have examined all the possibilities however he have not reached a solid conclusion <S> however we keep and open mind<S> Have you considered any other approach<S> Haven't you<S>"
> gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
[1] "We have examined all the possibilities however he have not reached a solid conclusion *** however we keep and open mind*** Have you considered any other approach*** Haven't you***"
除了句末標記,例如句號,感嘆號,審問標記和連字符以外,我想消除所有標點符號。
gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T)
我想用標記***代替。 同時,我還想保留包含撇號的單詞。
gsub("[-.?!]", "***", gsub("(?![-.?!'])[[:punct:]]", "", txt, perl=T))
您可以通過使用兩個正則表達式來做到這一點。 首先,您可以使用字符類來刪除不需要的字符:
[,.]
^--- Whatever you want to remove, put it here
並使用空的替換字符串。
然后,您可以使用第二個正則表達式,如下所示:
[?!-]
^--- Add characters you want to replace here
用替換字符串:
<S>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.