[英]Extracting a single word from a vector of strings in R
假設我有一個像下面這樣的字符串向量,並且我想創建一個邏輯向量,如果在字符串中出現單詞“ white”,“ bull”或“ tiger”(注意不是whitetip),則包含TRUE,如果它們包含FALSE不要。 我該如何在R中執行此操作? 我嘗試使用Stringr的str_detect(),但結果為“ whitetip”提供了TRUE(而且我不知道如何為每個類別使用str_detect()...即我必須創建多個邏輯向量-1用於我的3種白虎和公牛中的每一種)。 任何幫助都將非常棒,謝謝!
string<-c("tiger?", "thought to involve a 2.7 m [9'], 400-kb bull",
"4 m to 5 m [13' to 16.5'] white", "oceanic whitetip shark, 2.5 to 3m",
"white","white","bull","white","oceanic whitetip shark, 2.5m","tiger",
"white, >6'","bull, 6'")
這是匹配所有字符串的一種方法
sapply(c("white","bull","tiger"), function(x) {
grepl(paste0("\\b",x,"\\b"), string)
})
這給
white bull tiger
[1,] FALSE FALSE TRUE # tiger?
[2,] FALSE TRUE FALSE # thought to involve a 2.7 m [9'], 400-kb bull
[3,] TRUE FALSE FALSE # 4 m to 5 m [13' to 16.5'] white
[4,] FALSE FALSE FALSE # oceanic whitetip shark, 2.5 to 3m
[5,] TRUE FALSE FALSE # white
[6,] TRUE FALSE FALSE # white
[7,] FALSE TRUE FALSE # bull
[8,] TRUE FALSE FALSE # white
[9,] FALSE FALSE FALSE # oceanic whitetip shark, 2.5m
[10,] FALSE FALSE TRUE # tiger
[11,] TRUE FALSE FALSE # white, >6'
[12,] FALSE TRUE FALSE # bull, 6'
如果需要提取相關單詞,可以使用stringr::str_extract
:
str_extract(string, "\\b(bull|tiger|white)\\b")
# [1] "tiger" "bull" "white" NA "white" "white" "bull" "white" NA
#[10] "tiger" "white" "bull"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.