根据条件R更改数据子集的值

Question

I ran into an issue when I am trying to manually change some values 当我尝试手动更改某些值时，我遇到了一个问题

Here is my data set 这是我的数据集

dat <- read.table(text='
id  Item                                                 Category    Next_Category
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           1
 1  "CRANBERRY 10PKTS CARTON"                            1           1
 1  "CRANBERRY 10PKTS CARTON"                            1           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           NA
', header=TRUE)

You can see that row 3 and row 4 has Category of 1. The conditions would be that row 3 and row 4 has values that can be found in the previous row (row 2), and that they continue to the next row (row 5). 您可以看到第3行和第4行的类别为1.条件是第3行和第4行具有可在上一行（第2行）中找到的值，并且它们将继续到下一行（第5行））。 If so, they actually belong to Category 2 instead of Category 1 (yeah I know it is strange, but this is a requirement to treat them as the same). 如果是这样，它们实际上属于第2类而不是第1类（是的，我知道它很奇怪，但这是要求将它们视为相同）。

I have multiple ids. 我有多个ID。 I would like to only identify this kind of subset of data to achieve the desired outcome. 我想只识别这种数据子集，以达到预期的效果。

I have experimented with the idea of taking lag values of the Category to create an identifier on every decrease in the number from the Category. 我已经尝试了获取类别的滞后值的想法，以便在类别中每次减少数量时创建一个标识符。 Let's ignore the scenario where there is an increase in the number from the Category first. 让我们忽略首先从类别中增加数量的情况。

Expected output would be: 预期产出将是：

id  Item                                                 Category    Next_Category
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           1
 1  "CRANBERRY 10PKTS CARTON"                            2           1
 1  "CRANBERRY 10PKTS CARTON"                            2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           NA

Many thanks in advance! 提前谢谢了！

Answer 1

We create a sequence column ('i1'), reshape the 'wide' to 'long' format by splitting the 'Item' column by ', ' using cSplit , get the first element of 'Category', 'Next_Category' while collapsing the 'Item' with paste and finally assign the 'i1' to NULL 我们创建一个序列列（'i1'），通过使用cSplit将'Item'列拆分为'， ', ' cSplit ', ' cSplit为'long'格式，获取'Category'的第一个元素，'Next_Category'同时折叠带有paste 'Item'，最后将'i1'指定为NULL

dt1[, i1 := seq_len(.N)]
library(splitstackshape)
cSplit(dt1, "Item", ", ", "long")[, 
    Category := Category[1L], .(id, Item)
    ][, c(list(Item = paste(Item, collapse=", ")), 
   Category = Category[1L], Next_Category = Next_Category[1L]),.(id, i1)
     ][, i1 := NULL][]
#   id                                             Item Category Next_Category
#1:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             2
#2:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             1
#3:  1                          CRANBERRY 10PKTS CARTON        2             1
#4:  1                          CRANBERRY 10PKTS CARTON        2             2
#5:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2            NA

We can also use a similar approach with tidyverse 我们也可以使用类似的方法与tidyverse

library(tidyverse)
rownames_to_column(dat, "i1") %>% 
        separate_rows(Item, sep= ", ") %>% 
        group_by(i1, id) %>%
        mutate(Item = paste(Item, collapse=", ")) %>% 
        group_by(Item, add=TRUE) %>% 
        summarise_at(vars(Category, Next_Category), first) %>% 
        ungroup() %>% 
        select(-i1)
# A tibble: 5 × 4
#       id                                             Item Category Next_Category
#    <int>                                            <chr>    <int>         <int>
#1     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             2
#2     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             1
#3     1                          CRANBERRY 10PKTS CARTON        1             1
#4     1                          CRANBERRY 10PKTS CARTON        1             2
#5     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2            NA

data 数据

dt1 <- as.data.table(dat)

根据条件R更改数据子集的值

问题描述

1 个解决方案

解决方案1
3 2017-02-16 03:50:23

data 数据

根据条件R更改数据子集的值

问题描述

1 个解决方案

解决方案1 3 2017-02-16 03:50:23

data 数据

解决方案1
3 2017-02-16 03:50:23