I have a vector of word in r.
words = c("Awesome","Loss","Good","Bad")
And,I have following dataframe in r
ID Response
1 Today is an awesome day
2 Yesterday was a bad day,but today it is good
3 I have losses today
What I want to do is words that are matching in Response column should be extracted and inserted into new column in dataframe. Final output should look like this
ID Response Match Count
1 Today is an awesome day Awesome 1
2 Yesterday was a bad day Bad,Good 2
,but today it is good
3 I have losses today Loss 1
I did following in r
sapply(words,grepl,df$Response)
It matches the words,but how would I get my dataframe in desired format? Please help.
using base R - (credits to PereG too for help in concised answer to df$Counts)
# extract the list of matching words
x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))
# paste the matching words together
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
# count the number of matching words
df$Count <- apply(x, 1, function(i) sum(i))
# df
# ID Response Words Count
#1 1 Today is an awesome day Awesome 1
#2 2 Yesterday was a bad day,but today it is good Good,Bad 2
#3 3 I have losses today Loss 1
Here's another option, which stores the matches in list
s:
vgrepl <- Vectorize(grepl, "pattern")
df$Match <- lapply(df$Response, function(x)
words[vgrepl(words, x, ignore.case=T)]
)
df$Count <- lengths(df$Match)
With df as the dataframe and using stringr the following will also work:
matches <- sapply(1:length(words), function(i) str_extract_all(tolower(df$Response),
tolower(words[i]), simplify = TRUE))
df$Match <- gsub('[,][,]+|^,|,$', '', apply(matches, 1, paste, collapse=','))
df$Count <- apply(matches, 1, function(x) sum(x != ''))
head(df)
# ID Response Match Count
#1 1 Today is an awesome day awesome 1
#2 2 Yesterday was a bad day,but today it is good good,bad 2
#3 3 I have losses today loss 1
Solution/suggestion in tidyverse
. It reports the actual matches, not the patterns which were matched case-insensitive, but it should be sufficient for illustration purposes.
library(stringr)
library(dplyr)
library(purrr)
words <- c("Awesome", "Loss", "Good", "Bad")
"ID;Response
1;Today is an awesome day
2;Yesterday was a bad day,but today it is good
3;I have losses today" %>%
textConnection %>%
read.table(header = TRUE,
sep = ";",
stringsAsFactors = FALSE) ->
d
d %>%
mutate(matches = str_extract_all(
Response,
str_c(words, collapse = "|") %>% regex(ignore_case = T)),
Match = map_chr(matches, str_c, collapse = ","),
Count = map_int(matches, length))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.