匹配文本與r中的dataframe列

Question

我在r中有一個單詞的向量。

words = c("Awesome","Loss","Good","Bad")

而且，我在r中有以下數據幀

ID           Response
1            Today is an awesome day
2            Yesterday was a bad day,but today it is good
3            I have losses today

我想要做的是應該提取響應列中匹配的單詞並將其插入到數據框中的新列中。 最終輸出應該如下所示

ID           Response                        Match          Count 
1            Today is an awesome day        Awesome           1
2            Yesterday was a bad day        Bad,Good          2 
             ,but today it is good      
3            I have losses today             Loss             1

我在r中做了以下

sapply(words,grepl,df$Response)

它匹配單詞，但我如何獲得所需格式的數據幀？ 請幫忙。

Answer 1

使用基數R - （在Df $ Counts的簡明答案中也可以獲得PereG的幫助）

# extract the list of matching words
x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))

# paste the matching words together
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

# count the number of matching words
df$Count <- apply(x, 1, function(i) sum(i))

# df
#  ID                                     Response    Words Count
#1  1                      Today is an awesome day  Awesome     1
#2  2 Yesterday was a bad day,but today it is good Good,Bad     2
#3  3                          I have losses today     Loss     1

Answer 2

這是另一個選項，它將匹配存儲在list s中：

vgrepl <- Vectorize(grepl, "pattern")
df$Match <- lapply(df$Response, function(x) 
  words[vgrepl(words, x, ignore.case=T)]
)
df$Count <- lengths(df$Match)

Answer 3

使用df作為數據幀並使用stringr，以下內容也將起作用：

matches <- sapply(1:length(words), function(i) str_extract_all(tolower(df$Response),
                                                     tolower(words[i]), simplify = TRUE))
df$Match <- gsub('[,][,]+|^,|,$', '', apply(matches, 1, paste, collapse=','))
df$Count <- apply(matches, 1, function(x) sum(x != ''))
head(df)

#  ID                                     Response    Match Count
#1  1                      Today is an awesome day  awesome     1
#2  2 Yesterday was a bad day,but today it is good good,bad     2
#3  3                          I have losses today     loss     1

Answer 4

tidyverse解決方案/建議。 它報告實際匹配，而不是匹配不區分大小寫的模式，但它應足以用於說明目的。

library(stringr)
library(dplyr)
library(purrr)

words <- c("Awesome", "Loss", "Good", "Bad")
"ID;Response
1;Today is an awesome day
2;Yesterday was a bad day,but today it is good
3;I have losses today" %>%
  textConnection %>%
  read.table(header = TRUE, 
             sep = ";",
             stringsAsFactors = FALSE) ->
  d

d %>%
  mutate(matches = str_extract_all(
                     Response,
                     str_c(words, collapse = "|") %>% regex(ignore_case = T)),
         Match = map_chr(matches, str_c, collapse = ","),
         Count = map_int(matches, length))

匹配文本與r中的dataframe列

問題描述

4 個解決方案

解決方案1
4 已采納 2016-12-28 18:38:29

解決方案2
0 2016-12-28 19:45:35

解決方案3
0 2016-12-28 21:48:05

解決方案4
0 2016-12-28 21:48:43

匹配文本與r中的dataframe列

問題描述

4 個解決方案

解決方案1 4 已采納 2016-12-28 18:38:29

解決方案2 0 2016-12-28 19:45:35

解決方案3 0 2016-12-28 21:48:05

解決方案4 0 2016-12-28 21:48:43

解決方案1
4 已采納 2016-12-28 18:38:29

解決方案2
0 2016-12-28 19:45:35

解決方案3
0 2016-12-28 21:48:05

解決方案4
0 2016-12-28 21:48:43