[英]Transform a Dataframe to a Binary Matrix in R
I have a data frame with 2 columns, customerID and StockCodes describing stockcodes bought by the customer over a period of time.我有一个包含 2 列的数据框,customerID 和 StockCodes 描述了客户在一段时间内购买的股票代码。 There could be multiple observations for the same customer as he might have bought the same items multiple times or different items over a period of time.同一客户可能有多次观察,因为他可能多次购买相同的物品或在一段时间内购买不同的物品。 The sample data looks as follows:样本数据如下所示:
CustomerID StockCode
12346 23166
12347 16008
12347 17021
12347 20665
12347 20719
12347 20719
12347 20719
12347 20719
12347 20780
12347 20782
12347 20966
12347 21035
I need to transpose the data frame in R such that all stockcodes would appear as columns without any repetition and each row will have a distinct customerID.我需要在 R 中转置数据帧,这样所有股票代码都将显示为没有任何重复的列,并且每一行都有一个不同的客户 ID。 I have two questions:我有两个问题:
The cross-section cell value will have either numeric '1' if the customer has at least one matching stock code else 0.如果客户至少有一个匹配的股票代码,则横截面单元格值将具有数字“1”,否则为 0。
The cross-section cell will have the count of stockcodes each customer has, if there is a matching stock code, else 0.如果有匹配的库存代码,则横截面单元格将包含每个客户拥有的库存代码的计数,否则为 0。
This is easily done with dplyr
and tidyr::pivot_wider
.这可以通过dplyr
和tidyr::pivot_wider
轻松完成。
Data数据
example <- data.frame(CustomerID = c(12346, 12347, 12347, 12347, 12347, 12347),
StockCode = c(23166, 16008, 17021, 20665, 20719, 20719)
)
Code for Part (1)第 (1) 部分的代码
A <- example %>%
distinct %>%
mutate(Test = 1) %>%
tidyr::pivot_wider(values_from = Test, names_from = StockCode) %>%
replace(is.na(.), 0)
Output for Part (1) Output 用于零件 (1)
# A tibble: 2 x 6
CustomerID `23166` `16008` `17021`
<dbl> <dbl> <dbl> <dbl>
1 12346 1 0 0
2 12347 0 1 1
# ... with 2 more variables:
# `20665` <dbl>, `20719` <dbl>
Code for Part (2)第 (2) 部分的代码
B <- example %>%
group_by_all %>%
count %>%
tidyr::pivot_wider(values_from = n, names_from = StockCode) %>%
replace(is.na(.), 0)
Output for Part (2) Output 用于零件 (2)
> B
# A tibble: 2 x 6
# Groups: CustomerID [2]
CustomerID `23166` `16008` `17021`
<dbl> <int> <int> <int>
1 12346 1 0 0
2 12347 0 1 1
# ... with 2 more variables:
# `20665` <int>, `20719` <int>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.