I have an R dataframe with 2 columns: ID of the transaction, and a list of products associated
I need a dataset that have the same number of rows (a row per transaction), a number of columns equal to all possible products with values from 0 to n depending on how many times the transaction contains that product
Is there a quick way to do this?
Reproducible example
Input
tibble(ID = c('01', '02'),
Products = list(c('Apple', 'Apple', 'Orange'), c('Pear')))
Output
tibble(ID = c('01', '02'),
Apple = c(2, 0),
Orange = c(1, 0),
Pear = c(0, 1))
# A tibble: 2 x 4
ID Apple Orange Pear
<chr> <dbl> <dbl> <dbl>
1 01 2 1 0
2 02 0 0 1
You can do this with unnest_longer
from tidyr
. Try this:
library(dplyr)
library(tidyr)
tibble(ID = c('01', '02'),
Products = list(c('Apple', 'Apple', 'Orange'), c('Pear'))) %>%
unnest_longer(Products) %>%
count(ID, Products) %>%
spread(Products, n, fill = 0)
#> # A tibble: 2 x 4
#> # Groups: ID [2]
#> ID Apple Orange Pear
#> <chr> <dbl> <dbl> <dbl>
#> 1 01 2 1 0
#> 2 02 0 0 1
Created on 2020-03-10 by the reprex package (v0.3.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.