Dataframe 基于条件的列循环和字符串连接 R (pref dplyr)

Question

I have a 2-column dataframe. First column contains a single entry of a class of items (in this case, vegetables).我有一个 2 列 dataframe。第一列包含 class 项目（在本例中为蔬菜）的单个条目。 The second column is the incoming new_item , which are grocery items of different categories (meat, fruit, veg, etc).第二列是传入的new_item ，它们是不同类别的杂货（肉类、水果、蔬菜等）。

library(tidyverse)
current <- tibble::tribble(
             ~prev_veg,   ~new_item,
             "cabbage",   "lettuce",
                    NA,     "apple",
                    NA,     "beef",
                    NA,   "spinach",
                    NA,  "broccoli",
                    NA,     "mango"
             )
current

I would like to loop through the new item column, and only add vegetables to prev_veg .我想遍历新项目列，只将蔬菜添加到prev_veg 。 Any new item that is vegetable needs to be appended onto the existing list.任何新的蔬菜项目都需要添加到现有列表中。 Importantly, I have a vector of all possible vegetables that could occur in that list.重要的是，我有一个包含所有可能出现在该列表中的蔬菜的向量。 The desired dataframe is below.所需的 dataframe 如下。

target_veg <- c("cabbage","lettuce", "spinach", "broccoli"

    desired <- tibble::tribble(
      ~prev_veg,   ~new_item,
      "cabbage",   "lettuce",
      "cabbage, lettuce",     "apple",
      "cabbage, lettuce", "strawbery",
      "cabbage, lettuce",   "spinach",
      "cabbage, lettuce, spinach",  "broccoli",
      "cabbage, lettuce, spinach, broccoli",     "mango"
    )

desired

Finally, there are multiple other data columns in this dataframe that I have not included here (only relevant columns included).最后，这个dataframe中还有多个其他数据列我没有包含在这里（只包含相关列）。 Ideally looking for a dplyr solution please.理想情况下请寻找 dplyr 解决方案。

Answer 1

current <- tibble::tribble(
  ~prev_veg, ~new_item,
  "cabbage", "lettuce",
  NA, "apple",
  NA, "beef",
  NA, "spinach",
  NA, "broccoli",
  NA, "mango"
)
target_veg <- c("cabbage", "lettuce", "spinach", "broccoli")

library(dplyr, warn.conflicts = FALSE)
library(purrr)

current %>%
  mutate(
    prev_veg = accumulate(
      head(new_item, -1),
      ~ if_else(.y %in% target_veg, paste(.x, .y, sep = ", "), .x),
      .init = prev_veg[1]
    )
  )
#> # A tibble: 6 × 2
#>   prev_veg                            new_item
#>   <chr>                               <chr>   
#> 1 cabbage                             lettuce 
#> 2 cabbage, lettuce                    apple   
#> 3 cabbage, lettuce                    beef    
#> 4 cabbage, lettuce                    spinach 
#> 5 cabbage, lettuce, spinach           broccoli
#> 6 cabbage, lettuce, spinach, broccoli mango

^{Created on 2022-02-24 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-02-24}

Answer 2

This may also be created with finding an index with match and then using rowwise to paste这也可以通过查找match的索引然后使用rowwise粘贴来创建

library(dplyr)
library(tidyr)
current  %>%
   mutate(ind =  lag(match(new_item, target_veg))) %>% 
   fill(ind, .direction = "downup") %>%
   rowwise %>%
   mutate(ind = toString(target_veg[seq(ind)])) %>% 
   ungroup %>% 
   mutate(prev_veg = coalesce(prev_veg, ind), .keep = "unused")

-output -输出

# A tibble: 6 × 2
  prev_veg                            new_item
  <chr>                               <chr>   
1 cabbage                             lettuce 
2 cabbage, lettuce                    apple   
3 cabbage, lettuce                    beef    
4 cabbage, lettuce                    spinach 
5 cabbage, lettuce, spinach           broccoli
6 cabbage, lettuce, spinach, broccoli mango

NOTE: rowwise could be slow compared to @IceCreamToucan's accumulate .注意：与@IceCreamToucan 的accumulate相比， rowwise可能会很慢。

Dataframe 基于条件的列循环和字符串连接 R (pref dplyr)

问题描述

2 个解决方案

解决方案1
4 已采纳 2022-02-24 19:43:15

解决方案2
2 2022-02-24 20:49:53

Dataframe 基于条件的列循环和字符串连接 R (pref dplyr)

问题描述

2 个解决方案

解决方案1 4 已采纳 2022-02-24 19:43:15

解决方案2 2 2022-02-24 20:49:53

解决方案1
4 已采纳 2022-02-24 19:43:15

解决方案2
2 2022-02-24 20:49:53