[英]R: Extract capital letters and special characters with strsplit and perl REGEX syntax
如何僅提取帶有以下大寫字母的/
以及整個[[:punct:]]/$[[:punct:]]
。
text <- c("This/ART ,/$; Is/NN something something/else A/VAFIN faulty/ADV text/ADV which/ADJD i/PWS propose/ADV as/APPR Example/NE ./$. So/NE It/PTKNEG makes/ADJD no/VAFIN sense/ADV at/KOUS all/PDAT ,/$, it/APPR Has/ADJA Errors/NN ,/$; and/APPR it/APPR is/CARD senseless/NN again/ART ./$:")
# HOW to?
textPOS <- strsplit(text,"( )|(?<=[[:punct:]]/\\$[[:punct:]])", perl=TRUE)
# ^^^
# extract only the "/" with the following capital letters
# and the whole "[[:punct:]]/$[[:punct:]]"
# Expected RETURN:
> textPOS
[1] "/ART" ",/$;" "/NN" "/VAFIN" "/ADV" "/ADV" "/ADJD" "/PWS" "/ADV" "/APPR" "/NE" "./$." "/NE" "/PTKNEG" "/ADJD" "/VAFIN" "/ADV" "/KOUS" "/PDAT" ",/$," "/APPR" "/ADJA" "/NN" ",/$;" "/APPR" "/APPR" "/CARD" "/NN" "/ART" "./$:"
謝謝! :)
您可以使用gregexpr
和regmatches
:
regmatches(text, gregexpr('[[:punct:]]*/[[:alpha:][:punct:]]*', text))
# [[1]]
# [1] "/ART" "/NN" "/VAFIN" "/ADV" "/ADV" "/ADJD" "/PWS" "/ADV" "/APPR" "/NE" "./$." "/NE"
# [13] "/PTKNEG" "/ADJD" "/VAFIN" "/ADV" "/KOUS" "/PDAT" ",/$," "/APPR" "/ADJA" "/NN" ",/$;" "/APPR"
# [25] "/APPR" "/CARD" "/NN" "/ART" "./$:"
正則表達式用詞來表達:“查找以零個或多個標點符號開頭,后跟一個斜杠,一個或多個字母或標點符號的東西。如果要包含數字,請切換到[:alnum:]
。
根據注釋,如果只需要大寫字母,則正則表達式將變為:
regmatches(text, gregexpr('[[:punct:]]*/[[:upper:][:punct:]]*', text))
正如@eddi所建議的, [AZ]
和[:upper:]
大致相等。 再次如@eddi所示,此正則表達式將捕獲/ LETTERS以及/ $ punct的情況:
/[A-Z]+|[[:punct:]]/\\$[[:punct:]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.