Sorry for the wordy title - I promise when you look at the example below, the title will be clear. I have the following short dataframe:
dput(mydf)
structure(list(retweet_count = c(186L, 140L, 205L, 30L, 74L,
190L, 27L), hashtags = list("Potato", "Runner", "Money", c("Cheese",
"Potato", "Hammer", "Blue", "Runner", "Fighter"), c("Trust",
"Believe"), "YouCanDoIt", c("Potato", "OneFamily"))), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 7 x 2
retweet_count hashtags
<int> <list>
1 186 <chr [1]>
2 140 <chr [1]>
3 205 <chr [1]>
4 30 <chr [6]>
5 74 <chr [2]>
6 190 <chr [1]>
7 27 <chr [2]>
A view of zed shows this:
You can see that the hashtags column in mydf is of type list, and each row is a vector of strings. I would like to return a filtered version of this dataframe, keeping only those rows where "Potato" is included (rows 1, 4, and 7). I have tried this:
# whoops had this backwards - fixed now
mydf %>% dplyr::filter("Potato" %in% hashtags)
but this doesnt work. any help with this is SUPER appreciated, since i have to do this in a few places in my code.
%in%
doesn't check nested membership; You need to loop through the column and create a boolean vector for filtering:
mydf %>% filter(sapply(hashtags, function(v) 'Potato' %in% v))
# A tibble: 3 x 2
# retweet_count hashtags
# <int> <list>
#1 186 <chr [1]>
#2 30 <chr [6]>
#3 27 <chr [2]>
Or use purrr::map_lgl
for sapply
:
mydf %>% filter(purrr::map_lgl(hashtags, ~ 'Potato' %in% .))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.