简体   繁体   中英

How can I combine rows, but paste different values into one column in r?

I want to combine rows that have almost the same values, but I want to combine the values that are different so I won't loose information that I want to analyse later.

I have the following dataset:

SessionId      Client id      Product_type       Item quantity
   1              1               Couch                1              
   1              1               Table                1
   2              2               Couch                1
   2              2               Chair                5

I want to have an output like:

SessionId      Client id      Product_type       Item quantity
   1              1            Couch, Table           2
   2              2            Couch, Chair           6

So I need to merge rows based on the session id. But for the column product type I want to paste character names behind each other and for the item quantity I want to sum the quantities. I have way more columns, but those values can stay the same.

Maybe I need to do it in two steps, but im not sure how to begin. Hopefully someone can help me out.

Try this.

d %>% group_by(SessionId,Client_id) %>% 
  summarise(prod_type = toString(Product_type),
            sum_item_q = sum(Item_quantity, na.rm = T))

output as:

# A tibble: 2 x 4
# Groups:   SessionId [2]
  SessionId Client_id prod_type    sum_item_q
      <int>     <int> <chr>             <int>
1         1         1 Couch, Table          2
2         2         2 Couch, Chair          6

data

structure(list(SessionId = c(1L, 1L, 2L, 2L), Client_id = c(1L, 
                                                            1L, 2L, 2L), Product_type = c("Couch", "Table", "Couch", "Chair"
                                                            ), Item_quantity = c(1L, 1L, 1L, 5L)), row.names = c(NA, -4L), class = c("data.table", 
                                                                                                                                     "data.frame"))->d

This can be achieved like so

df <- read.table(text = "SessionId      'Client id'      Product_type       'Item quantity'
   1              1               Couch                1              
   1              1               Table                1
   2              2               Couch                1
   2              2               Chair                5", header = TRUE)

library(dplyr)

df %>% 
  group_by(SessionId, Client.id) %>% 
  summarise(Product_type = paste(Product_type, collapse = ", "),
            Item.quantity = sum(Item.quantity))
#> # A tibble: 2 x 4
#> # Groups:   SessionId [2]
#>   SessionId Client.id Product_type Item.quantity
#>       <int>     <int> <chr>                <int>
#> 1         1         1 Couch, Table             2
#> 2         2         2 Couch, Chair             6

Created on 2020-05-23 by the reprex package (v0.3.0)

Base R solution:

aggregate(.~SessionId+Client_Id, within(df, {Product_type <- as.character(Product_type)}),
          FUN = function(x){if(is.integer(x)){sum(x)}else{toString(as.character(x))}})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM