简体   繁体   中英

Split list column into multiple integer columns on R dataframe

I have an R dataframe with 2 columns: ID of the transaction, and a list of products associated

I need a dataset that have the same number of rows (a row per transaction), a number of columns equal to all possible products with values from 0 to n depending on how many times the transaction contains that product

Is there a quick way to do this?

Reproducible example

Input

tibble(ID = c('01', '02'),
           Products = list(c('Apple', 'Apple', 'Orange'), c('Pear')))

Output

tibble(ID = c('01', '02'),
       Apple = c(2, 0),
       Orange = c(1, 0),
       Pear = c(0, 1))

# A tibble: 2 x 4
  ID    Apple Orange  Pear
  <chr> <dbl>  <dbl> <dbl>
1 01        2      1     0
2 02        0      0     1

You can do this with unnest_longer from tidyr . Try this:

library(dplyr)
library(tidyr)

tibble(ID = c('01', '02'),
             Products = list(c('Apple', 'Apple', 'Orange'), c('Pear'))) %>% 
  unnest_longer(Products) %>% 
  count(ID, Products) %>% 
  spread(Products, n, fill = 0)
#> # A tibble: 2 x 4
#> # Groups:   ID [2]
#>   ID    Apple Orange  Pear
#>   <chr> <dbl>  <dbl> <dbl>
#> 1 01        2      1     0
#> 2 02        0      0     1

Created on 2020-03-10 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM