簡體   English   中英

刪除兩個特定字符之間的幾個字符串

[英]Remove several strings between two specific characters

我需要R中的正則表達式幫助。我有一堆字符串,每個字符串都具有與此類似的結構:

mytext <- "\"Dimitri. It has absolutely no meaning,\": Allow me to him|\"realize that\": Poor Alice! It |\"HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes\": |\"same for the Dislikes.  Thank you very much for completing this\": ME.' 'You!' sai"

請注意,此字符串包含“”內的子字符串,后跟“:”和一些不帶引號的文本-直到我們遇到“ |” -然后出現新的引號,等等。

還請注意,在結尾處,在“:”后面有文本-但在結尾處沒有“ |”

我的目標是完全消除所有以“:”(包括“:”)開頭的文本,直到下一個“ |”為止 (但必須保留“ |”)。 我還需要消除最后一個“:”之后的所有文本。

最后(更多的是額外的好處)-我想擺脫所有的“ \\”字符和所有的引號-因為在最終解決方案中,我需要使用“純文本”:一串只用“ |”分隔的字符串 字符。

可能嗎?

這是我尷尬的第一次嘗試:

gsub('\\:.*?\\|', '', mytext)

此方法使用g?sub 3次傳遞。

sub("\\|$", "", gsub("[\\\\\"]", "", gsub(":.*?(\\||$)", "|", mytext)))
[1] "Dimitri. It has absolutely no meaning,|realize that|HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes|same for the Dislikes.  Thank you very much for completing this"

第一個去除“:”和“ |”之間的文本 包含並用“ |”代替。 第二遍刪除“ \\”和“”,而第三遍刪除最后的“ |”。

通過單一gsub可以搭配后文本: (包括: ),只要它不包含管道: :[^|]* 這也匹配字符串末尾的大小寫。 您還可以通過在替換字符( | )之后搜索另一個模式來匹配雙引號: [\\"]

gsub(":[^|]*|[\"]", "", mytext)
#[1] "Dimitri. It has absolutely no meaning,|realize that|HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes|same for the Dislikes.  Thank you very much for completing this"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM