簡體   English   中英

與 r 中的 dataframe 列的精確匹配文本

[英]Exact Matching text with dataframe column in r

我在 R 中有一個單詞向量:

words = c("Awesome","Loss","Good","Bad")

我在 R 中有以下 dataframe:

df <- data.frame(ID = c(1,2,3),
                 Response = c("Today is an awesome day", 
                              "Yesterday was a bad day,but today it is good",
                              "I have losses today"))

我想要做的是在響應列中完全匹配的單詞應該被提取並插入到 dataframe 的新列中。 最終的 output 應該是這樣的

ID           Response                        Match          
1            Today is an awesome day        Awesome           
2            Yesterday was a bad day        Bad,Good           
             ,but today it is good      
3            I have losses today            NA

我使用了以下代碼:

提取匹配詞列表

x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))

將匹配的單詞粘貼在一起

df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

但它提供了匹配,但不是確切的。 請幫忙。

如果您在words向量中使用錨點,您將確保完全匹配: ^ 斷言您在開頭, $ 斷言您在單詞結尾。 所以:

words = c("Awesome","^Loss$","Good","Bad")

然后使用您的代碼:

x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

這使:

> df
  ID                                     Response    Words
1  1                      Today is an awesome day  Awesome
2  2 Yesterday was a bad day,but today it is good Good,Bad
3  3                          I have losses today  

將空白變為NA

df$Words[df$Words == ""] <- NA

我們可以使用str_extract_all

library(stringr)
library(dplyr)
library(purrr)
df %>%
    mutate(Words = map_chr(str_extract_all(Response, str_c("
       (?i)\\b(", str_c(words, collapse="|"), ")\\b")), toString))
#   ID                                     Response     Words
#1  1                      Today is an awesome day   awesome
#2  2 Yesterday was a bad day,but today it is good bad, good
#3  3                          I have losses today          

數據

words <- c("Awesome","Loss","Good","Bad")

將第一個*apply function 更改為兩行 function。 如果正則表達式變為"\\bword\\b" ,那么它會捕獲由邊界包圍的單詞。

x <- sapply(words, function(x) {
  y <- paste0("\\b", x, "\\b")
  grepl(tolower(y), tolower(df$Response))
})

現在運行問題中發布的第二個apply程序。

df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

df
#  ID                                     Response    Words
#1  1                      Today is an awesome day  Awesome
#2  2 Yesterday was a bad day,but today it is good Good,Bad
#3  3                          I have losses today       

至於NA ,我將使用 function is.na<-

is.na(df$Words) <- df$Words == ""

數據。

df <- read.table(text = "
ID           Response
1            'Today is an awesome day'
2            'Yesterday was a bad day,but today it is good'
3            'I have losses today'
", header = TRUE)

words <- c("Awesome","Loss","Good","Bad")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM