[英]Create a variable listing grouped unique values of several other variables with name pattern
my problem is an extension of this one:我的问题是这个问题的扩展:
Create a list of all values of a variable grouped by another variable in R 创建由 R 中的另一个变量分组的变量的所有值的列表
Let's say we have a data frame with restaurants and the meals they offer by type and course:假设我们有一个数据框,其中包含餐厅及其按类型和课程提供的餐点:
food <- data.frame(course = c("starter", "starter", "starter", "main", "main", "main", "main", "main"),
food_type = c("salad", "salad", "salad", "fish", "fish", "pasta", "pasta", "pasta"),
restaurant = c("dining_palace", "delicious_kitchen", "food_cube", "dining_palace", "food_cube", "dining_palace", "delicious_kitchen", "food_cube"),
meal1 = c("cesar_salad", "green_salad", "green_salad", "codfish", "trout", "spaghetti", "farfalle", "macaroni"),
meal2 = c("coleslaw", "tomato_salad", NA, "salmon", "codfish", "tagliatelle", "penne", "farfalle"),
meal3 = c(NA, "coleslaw", NA, "tuna", NA, NA, "spaghetti", "ravioli"), stringsAsFactors = FALSE)
food
course food_type restaurant meal1 meal2 meal3
1 starter salad dining_palace cesar_salad coleslaw <NA>
2 starter salad delicious_kitchen green_salad tomato_salad coleslaw
3 starter salad food_cube green_salad <NA> <NA>
4 main fish dining_palace codfish salmon tuna
5 main fish food_cube trout codfish <NA>
6 main pasta dining_palace spaghetti tagliatelle <NA>
7 main pasta delicious_kitchen farfalle penne spaghetti
8 main pasta food_cube macaroni farfalle ravioli
My aim is to gnereate a varaible that contains a list of all meals by course and food type independent of the offering restaurant.我的目标是生成一个变量,其中包含按课程和食物类型列出的所有餐点列表,与提供的餐厅无关。 Using the code from the link above with c(meal1, meal2, meal3) gives exactly the desired outcome:将上面链接中的代码与 c(meal1, meal2, meal3) 一起使用可以得到完全期望的结果:
library(dplyr)
selection_per_type <- food %>%
group_by(course, food_type) %>%
summarise(meals=paste(sort(unique(c(meal1, meal2, meal3))),collapse=",")) %>%
ungroup()
selection_per_type
course food_type meals
<chr> <chr> <chr>
1 main fish codfish,salmon,trout,tuna
2 main pasta farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad cesar_salad,coleslaw,green_salad,tomato_salad
However, I'm looking for a solution with a higher number of meal variables, where a manual definition via c() is not practical.但是,我正在寻找具有更多膳食变量的解决方案,其中通过 c() 手动定义是不切实际的。 As the first n letters of all target variables are identical, I've tried some versions of "pattern" "grepl" and "regexec", but nothing has worked so far.由于所有目标变量的前 n 个字母都是相同的,我尝试了一些版本的“模式”、“grepl”和“regexec”,但到目前为止没有任何效果。 Are there any ideas, how to get this done?有什么想法,如何完成?
If there are more columns, we may use pivot_longer
to convert to long format and then do a group by summarise如果有更多的列,我们可以使用pivot_longer
转换为长格式,然后通过summary 进行分组
library(dplyr)
library(tidyr)
library(stringr)
food %>%
pivot_longer(cols = starts_with("meal"), values_to ='meal') %>%
group_by(course, food_type) %>%
summarise(means = str_c(unique(sort(na.omit(meal))),
collapse = ","), .groups = 'drop')
-output -输出
# A tibble: 3 × 3
course food_type means
<chr> <chr> <chr>
1 main fish codfish,salmon,trout,tuna
2 main pasta farfalle,macaroni,penne,ravioli,spaghetti,tagliatelle
3 starter salad cesar_salad,coleslaw,green_salad,tomato_salad
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.