[英]matching text with dataframe column in r
我在r中有一個單詞的向量。
words = c("Awesome","Loss","Good","Bad")
而且,我在r中有以下數據幀
ID Response
1 Today is an awesome day
2 Yesterday was a bad day,but today it is good
3 I have losses today
我想要做的是應該提取響應列中匹配的單詞並將其插入到數據框中的新列中。 最終輸出應該如下所示
ID Response Match Count
1 Today is an awesome day Awesome 1
2 Yesterday was a bad day Bad,Good 2
,but today it is good
3 I have losses today Loss 1
我在r中做了以下
sapply(words,grepl,df$Response)
它匹配單詞,但我如何獲得所需格式的數據幀? 請幫忙。
使用基數R - (在Df $ Counts的簡明答案中也可以獲得PereG的幫助)
# extract the list of matching words
x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))
# paste the matching words together
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
# count the number of matching words
df$Count <- apply(x, 1, function(i) sum(i))
# df
# ID Response Words Count
#1 1 Today is an awesome day Awesome 1
#2 2 Yesterday was a bad day,but today it is good Good,Bad 2
#3 3 I have losses today Loss 1
這是另一個選項,它將匹配存儲在list
s中:
vgrepl <- Vectorize(grepl, "pattern")
df$Match <- lapply(df$Response, function(x)
words[vgrepl(words, x, ignore.case=T)]
)
df$Count <- lengths(df$Match)
使用df作為數據幀並使用stringr,以下內容也將起作用:
matches <- sapply(1:length(words), function(i) str_extract_all(tolower(df$Response),
tolower(words[i]), simplify = TRUE))
df$Match <- gsub('[,][,]+|^,|,$', '', apply(matches, 1, paste, collapse=','))
df$Count <- apply(matches, 1, function(x) sum(x != ''))
head(df)
# ID Response Match Count
#1 1 Today is an awesome day awesome 1
#2 2 Yesterday was a bad day,but today it is good good,bad 2
#3 3 I have losses today loss 1
tidyverse
解決方案/建議。 它報告實際匹配,而不是匹配不區分大小寫的模式,但它應足以用於說明目的。
library(stringr)
library(dplyr)
library(purrr)
words <- c("Awesome", "Loss", "Good", "Bad")
"ID;Response
1;Today is an awesome day
2;Yesterday was a bad day,but today it is good
3;I have losses today" %>%
textConnection %>%
read.table(header = TRUE,
sep = ";",
stringsAsFactors = FALSE) ->
d
d %>%
mutate(matches = str_extract_all(
Response,
str_c(words, collapse = "|") %>% regex(ignore_case = T)),
Match = map_chr(matches, str_c, collapse = ","),
Count = map_int(matches, length))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.