[英]Remove several strings between two specific characters
我需要R中的正則表達式幫助。我有一堆字符串,每個字符串都具有與此類似的結構:
mytext <- "\"Dimitri. It has absolutely no meaning,\": Allow me to him|\"realize that\": Poor Alice! It |\"HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes\": |\"same for the Dislikes. Thank you very much for completing this\": ME.' 'You!' sai"
請注意,此字符串包含“”內的子字符串,后跟“:”和一些不帶引號的文本-直到我們遇到“ |” -然后出現新的引號,等等。
還請注意,在結尾處,在“:”后面有文本-但在結尾處沒有“ |”
我的目標是完全消除所有以“:”(包括“:”)開頭的文本,直到下一個“ |”為止 (但必須保留“ |”)。 我還需要消除最后一個“:”之后的所有文本。
最后(更多的是額外的好處)-我想擺脫所有的“ \\”字符和所有的引號-因為在最終解決方案中,我需要使用“純文本”:一串只用“ |”分隔的字符串 字符。
可能嗎?
這是我尷尬的第一次嘗試:
gsub('\\:.*?\\|', '', mytext)
此方法使用g?sub
3次傳遞。
sub("\\|$", "", gsub("[\\\\\"]", "", gsub(":.*?(\\||$)", "|", mytext)))
[1] "Dimitri. It has absolutely no meaning,|realize that|HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes|same for the Dislikes. Thank you very much for completing this"
第一個去除“:”和“ |”之間的文本 包含並用“ |”代替。 第二遍刪除“ \\”和“”,而第三遍刪除最后的“ |”。
通過單一gsub
可以搭配后文本:
(包括:
),只要它不包含管道: :[^|]*
。 這也匹配字符串末尾的大小寫。 您還可以通過在替換字符( |
)之后搜索另一個模式來匹配雙引號: [\\"]
gsub(":[^|]*|[\"]", "", mytext)
#[1] "Dimitri. It has absolutely no meaning,|realize that|HIGHLIGHT A LOT OF THINGS. Our team is small and if each person highlights only 1 or 2 things, the counts of Likes|same for the Dislikes. Thank you very much for completing this"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.