I have a dataset that is similar to the repex below, where each subject has more than one row for their hobby, favorite food and their study major.
I am trying to identify for example those who have hiking as a hobby and meat as food. (the one that meets this criteria is subject c in the example below).
Is there a way to do this in dplyr or another package?
dd = structure(list(ID = c("a", "a", "a", "a", "b", "b", "b", "b",
"b", "b", "c", "c", "c", "c", "c", "c"), itemType = c("hobby",
"hobby", "study", "food", "hobby", "hobby", "study", "study",
"food", "food", "hobby", "hobby", "study", "study", "study",
"food"), details = c("hiking, bike", "reading", "math, art",
"cheese, bread", "writing", "computer", "english", "science",
"meat, rice", "cheese", "reading", "swimming, hiking", "math, philosophy",
"computer", "social", "pasta, meat")), class = "data.frame", row.names = c(NA,
-16L))
If I just try a simple dplyr filter as below, it won't work of course, it returns no items. is there another argument or something I can add to make it work ?
I never used database package, but will it be useful in this context?
dd %>%
filter( str_detect( details, "hiking") &
str_detect(details, "meat"))
If we need to subset 'ID' having both 'hiking' , 'meat' in 'details', do a group_by
'ID' and then apply the str_detect
for both 'hiking', 'meat', wrap with any
) and use &
or ,
library(dplyr)
library(stringr)
dd %>%
group_by(ID) %>%
filter(any(str_detect(details, 'hiking')), any(str_detect(details, 'meat')))
-output
# A tibble: 6 x 3
# Groups: ID [1]
# ID itemType details
# <chr> <chr> <chr>
#1 c hobby reading
#2 c hobby swimming, hiking
#3 c study math, philosophy
#4 c study computer
#5 c study social
#6 c food pasta, meat
If we wanted to further do the detection based on subgroup, an option is to subset the column with ==
and apply the str_detect
only those elements
dd %>%
group_by(ID) %>%
filter(any(str_detect(details[itemType == 'hobby'], 'hiking')),
any(str_detect(details[itemType == 'food'], 'meat')))
# A tibble: 6 x 3
# Groups: ID [1]
# ID itemType details
# <chr> <chr> <chr>
#1 c hobby reading
#2 c hobby swimming, hiking
#3 c study math, philosophy
#4 c study computer
#5 c study social
#6 c food pasta, meat
Or using base R
with ave
and grepl
subset(dd, as.logical(ave(details, ID,
FUN = function(x) any(grepl('hiking', x)) & any(grepl('meat', x)))))
The reason it didn't return any row is because no element in 'details' have both 'hiking' and 'meat' as the &
is doing elementwise comparison. Instead, we need to use the &
on any
of the elements in 'details' for each 'ID'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.