简体   繁体   中英

Check if string is contained within each vector of a list column in R dataframe

Sorry for the wordy title - I promise when you look at the example below, the title will be clear. I have the following short dataframe:

dput(mydf)
structure(list(retweet_count = c(186L, 140L, 205L, 30L, 74L, 
190L, 27L), hashtags = list("Potato", "Runner", "Money", c("Cheese", 
"Potato", "Hammer", "Blue", "Runner", "Fighter"), c("Trust", 
"Believe"), "YouCanDoIt", c("Potato", "OneFamily"))), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

# A tibble: 7 x 2
 retweet_count hashtags 
          <int> <list>   
1           186 <chr [1]>
2           140 <chr [1]>
3           205 <chr [1]>
4            30 <chr [6]>
5            74 <chr [2]>
6           190 <chr [1]>
7            27 <chr [2]>

A view of zed shows this:

在此输入图像描述

You can see that the hashtags column in mydf is of type list, and each row is a vector of strings. I would like to return a filtered version of this dataframe, keeping only those rows where "Potato" is included (rows 1, 4, and 7). I have tried this:

# whoops had this backwards - fixed now
mydf %>% dplyr::filter("Potato" %in% hashtags)

but this doesnt work. any help with this is SUPER appreciated, since i have to do this in a few places in my code.

%in% doesn't check nested membership; You need to loop through the column and create a boolean vector for filtering:

mydf %>% filter(sapply(hashtags, function(v) 'Potato' %in% v))

# A tibble: 3 x 2
#  retweet_count hashtags 
#          <int> <list>   
#1           186 <chr [1]>
#2            30 <chr [6]>
#3            27 <chr [2]>

Or use purrr::map_lgl for sapply :

mydf %>% filter(purrr::map_lgl(hashtags, ~ 'Potato' %in% .))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM