简体   繁体   English

创建一个变量,列出具有名称模式的其他几个变量的分组唯一值

[英]Create a variable listing grouped unique values of several other variables with name pattern

my problem is an extension of this one:我的问题是这个问题的扩展:

Create a list of all values of a variable grouped by another variable in R 创建由 R 中的另一个变量分组的变量的所有值的列表

Let's say we have a data frame with restaurants and the meals they offer by type and course:假设我们有一个数据框,其中包含餐厅及其按类型和课程提供的餐点:

    food <- data.frame(course = c("starter", "starter", "starter", "main", "main", "main", "main", "main"),
                      food_type = c("salad", "salad", "salad", "fish", "fish", "pasta", "pasta", "pasta"),
                      restaurant = c("dining_palace", "delicious_kitchen", "food_cube", "dining_palace", "food_cube", "dining_palace", "delicious_kitchen", "food_cube"),
                      meal1 = c("cesar_salad", "green_salad", "green_salad", "codfish", "trout", "spaghetti", "farfalle", "macaroni"),
                      meal2 = c("coleslaw", "tomato_salad", NA, "salmon", "codfish", "tagliatelle", "penne", "farfalle"),
                      meal3 = c(NA, "coleslaw", NA, "tuna", NA, NA, "spaghetti", "ravioli"), stringsAsFactors = FALSE)

food

 course food_type        restaurant       meal1        meal2     meal3
1 starter     salad     dining_palace cesar_salad     coleslaw      <NA>
2 starter     salad delicious_kitchen green_salad tomato_salad  coleslaw
3 starter     salad         food_cube green_salad         <NA>      <NA>
4    main      fish     dining_palace     codfish       salmon      tuna
5    main      fish         food_cube       trout      codfish      <NA>
6    main     pasta     dining_palace   spaghetti  tagliatelle      <NA>
7    main     pasta delicious_kitchen    farfalle        penne spaghetti
8    main     pasta         food_cube    macaroni     farfalle   ravioli

My aim is to gnereate a varaible that contains a list of all meals by course and food type independent of the offering restaurant.我的目标是生成一个变量,其中包含按课程和食物类型列出的所有餐点列表,与提供的餐厅无关。 Using the code from the link above with c(meal1, meal2, meal3) gives exactly the desired outcome:将上面链接中的代码与 c(meal1, meal2, meal3) 一起使用可以得到完全期望的结果:

library(dplyr)
selection_per_type <- food %>%
                      group_by(course, food_type) %>%
                      summarise(meals=paste(sort(unique(c(meal1, meal2, meal3))),collapse=",")) %>%
                      ungroup()

    selection_per_type
    
   course  food_type meals                                                
  <chr>   <chr>     <chr>                                                
1 main    fish      codfish,salmon,trout,tuna                            
2 main    pasta     farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad     cesar_salad,coleslaw,green_salad,tomato_salad 

However, I'm looking for a solution with a higher number of meal variables, where a manual definition via c() is not practical.但是,我正在寻找具有更多膳食变量的解决方案,其中通过 c() 手动定义是不切实际的。 As the first n letters of all target variables are identical, I've tried some versions of "pattern" "grepl" and "regexec", but nothing has worked so far.由于所有目标变量的前 n 个字母都是相同的,我尝试了一些版本的“模式”、“grepl”和“regexec”,但到目前为止没有任何效果。 Are there any ideas, how to get this done?有什么想法,如何完成?

If there are more columns, we may use pivot_longer to convert to long format and then do a group by summarise如果有更多的列,我们可以使用pivot_longer转换为长格式,然后通过summary 进行分组

library(dplyr)
library(tidyr)
library(stringr)
food %>% 
  pivot_longer(cols = starts_with("meal"), values_to ='meal') %>% 
  group_by(course, food_type) %>%
  summarise(means = str_c(unique(sort(na.omit(meal))), 
       collapse = ","), .groups = 'drop')

-output -输出

# A tibble: 3 × 3
  course  food_type means                                                
  <chr>   <chr>     <chr>                                                
1 main    fish      codfish,salmon,trout,tuna                            
2 main    pasta     farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad     cesar_salad,coleslaw,green_salad,tomato_salad    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用映射到唯一值的分层和任意分组变量创建一个小标题? - How to create a tibble with hierarchically and arbitrary grouped variables mapped to unique values? 创建data.table列出由另一个变量分组的一个变量的值 - Create data.table listing values of one variable grouped by another variable 根据其他几个变量的值创建新变量 - Create new variable based on the value of several other variables 对于两个其他变量与dplyr的每个唯一组合,仅对分组数据框中的变量求和一次 - Sum a variable in a grouped dataframe only once for each unique combination of two other variables with dplyr 将函数应用于具有相同名称模式的多个变量 - Apply function to several variables with same name pattern 在“标识列”分组的列中出现唯一值的“创建集(字符串)”变量 - Create Set (String) variable of unique values occuring in a Column grouped by Identification Column 从现有对中计算几个新变量,并根据 R 中的其他变量标准化新变量值 - Calculating several new variables from existing pairs and standardising new variable values against other variables in R 根据其他几个变量改变一个变量 - Mutate a variable depending on several other variables 如何基于其他变量中的值组合创建新变量 - How to create new variable based on a combination of values in other variables 无法创建以其他 2 个变量的 NA 值为条件的变量 - Can't create variable conditioned on NA values of other 2 variables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM