简体   繁体   English

R - 使用 map 将列表函数应用于数据框列并使用列表元素创建新列

[英]R - using map to apply a list function to dataframe column and create new columns with elements of the list

I have a dataframe with and id column and an eats column, and a separate food list.我有一个带有 id 列和一个吃列的数据框,以及一个单独的食物列表。 I want to process the dataframe so that a column is added for each food in the food list which is populated with 1 if the food is present in eats and 0 otherwise.我想处理数据框,以便为食物列表中的每种食物添加一列,如果食物中存在食物,则填充为 1,否则为 0。

txt <- tibble(id = c(1, 2, 3),
          eats = c("apple, oats, banana, milk, sugar",
                   "oats, banana, sugar",
                   "chocolate, milk, sugar"))

food_list <- c("apple", "oats", "chocolate")

for (i in food_list){
  print(i)
  txt <- txt %>% 
    mutate(!!i := if_else(stringr::str_detect(eats, i), 1, 0))
}

I could do this using a for loop but struggling to do it without a loop.我可以使用 for 循环来做到这一点,但在没有循环的情况下很难做到这一点。 I Will be very grateful if someone can point me to how this can be done without using for loops and instead using the purrr library map functions.如果有人能指出如何在不使用 for 循环而是使用 purrr 库映射函数的情况下完成此操作,我将不胜感激。

Thanks!谢谢!

We could use map as我们可以使用map作为

library(purrr)
library(dplyr)
library(stringr)
txt <- map_dfc(food_list, ~ txt %>%
      transmute(!! .x := +(stringr::str_detect(eats, .x)))) %>% 
    bind_cols(txt, .)

-output -输出

txt
# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

In base R , this can be done in on-linerbase R ,这可以在线完成

txt[food_list] <- +(sapply(food_list, grepl, x = txt$eats))

You can use cbind and str_detect , with map_df :您可以将cbindstr_detectmap_df一起map_df

library(dplyr)
library(purrr)
library(stringr)

cbind(txt, map_dfc(food_list, ~+str_detect(txt$eats, .x))%>%set_names(food_list))

  id                             eats apple oats chocolate
1  1 apple, oats, banana, milk, sugar     1    1         0
2  2              oats, banana, sugar     0    1         0
3  3           chocolate, milk, sugar     0    0         1

Here is an alternative solution:这是一个替代解决方案:

library(dplyr)
library(tidyr)

txt %>%
  separate_rows(eats, sep = ", ") %>%
  rowwise() %>%
  mutate(ext = match(eats, food_list)) %>%
  drop_na() %>%
  pivot_wider(names_from = eats, values_from = ext, values_fn = length, values_fill = 0) %>%
  right_join(txt, by = "id") %>%
  relocate(id, eats)

# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

You may use base R's Reduce like this您可以像这样使用基本 R 的Reduce

Reduce(function(a, b) {
  a[[b]] <- +(grepl(b, a[["eats"]]))
  a
}, init = txt, food_list)

# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

You may also use purrr::reduce similarly, where you can use (i) walrus operator and (ii) bang bang operators, instead of subsetting您也可以类似地使用purrr::reduce ,您可以在其中使用 (i) walrus 运算符和 (ii) bang bang 运算符,而不是子集

library(tidyverse)
txt <- tibble(id = c(1, 2, 3),
              eats = c("apple, oats, banana, milk, sugar",
                       "oats, banana, sugar",
                       "chocolate, milk, sugar"))

food_list <- c("apple", "oats", "chocolate")

reduce(food_list, .init = txt, ~ .x %>% 
         mutate(!!.y := +str_detect(eats, .y))
         )
#> # A tibble: 3 x 5
#>      id eats                             apple  oats chocolate
#>   <dbl> <chr>                            <int> <int>     <int>
#> 1     1 apple, oats, banana, milk, sugar     1     1         0
#> 2     2 oats, banana, sugar                  0     1         0
#> 3     3 chocolate, milk, sugar               0     0         1

Created on 2021-07-29 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 7 月 29 日创建

Add word boundaries ( \\\\b ) to the values in food_list so that words are matched completely.将单词边界 ( \\\\b ) 添加到food_list的值,以便单词完全匹配。

For example, see the difference in outputs in the following case -例如,在以下情况下查看输出的差异 -

library(stringr)
x <- c('apple', 'pineapple')

str_detect(x, 'apple')
#[1] TRUE TRUE

str_detect(x, '\\bapple\\b')
#[1]  TRUE FALSE

The same goes for grepl in base R -基础 R 中的grepl也是如此 -

food_list <- c("apple", "oats", "chocolate")
food_pat <- sprintf('\\b%s\\b', food_list)
txt[food_list] <- lapply(food_pat, function(x) as.integer(grepl(x, txt$eats)))
txt

# A tibble: 3 x 5
#     id eats                             apple  oats chocolate
#  <dbl> <chr>                            <int> <int>     <int>
#1     1 apple, oats, banana, milk, sugar     1     1         0
#2     2 oats, banana, sugar                  0     1         0
#3     3 chocolate, milk, sugar               0     0         1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM