简体   繁体   English

将字符向量与data.frame结合起来并完成表格

[英]Combine character vector with data.frame and complete table

I have a data frame with id-numbers, a product variable and a dummy variable that tells if a products has been bought or not. 我有一个带有ID号,一个产品变量和一个虚拟变量的数据框,该变量指示是否购买了产品。

set.seed(2019)
library(dplyr)
library(data.table)

df <- data.frame(id = rep.int(c(1:5), 5),
                 bought = 1) %>%
  group_by(id) %>%
  mutate(product = c("244.1","455.2","266.3","777.4","111.1"))

In addition to this I have a vector with products that I know have not been bought that I would like to add to the data frame. 除此之外,我还有一个向量,其中包含我想添加到数据框中的我尚未购买的产品。

products <- c("100.4", "500.1", "200.1", "121.6", "251.7", "215.1", "172.2")

That is, for each user I would like the non-bought products and set bought = 0. 也就是说,对于每个用户,我都希望购买非购买产品并将购买的商品设置为0。

One way to do this is to create a data frame out of the vector and bind it to the original data frame. 一种实现方法是从向量创建数据帧并将其绑定到原始数​​据帧。

products <- data.frame(product = products)
products$id <- NA
products$bought <- 0

products <- products[, c(2, 3, 1)]

df <- bind_rows(df, products)
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector

Then I can use data.table to complete the table, turn every NA = 0 and if I want filter away every observation with id = NA . 然后,我可以使用data.table来完成表,将每个NA = 0旋转,如果我想过滤掉id = NA每个观察值。 (I could use tidyr::complete() as well, but the original data.frame is very large so I prefer data.table ) (我也可以使用tidyr::complete() ,但是原始的data.frame非常大,所以我更喜欢data.table

setDT(df)[CJ(id = id, product = product, unique = TRUE), on = .(id, product)][
  is.na(bought), bought := 0][]
#>     id bought product
#>  1: NA      0   100.4
#>  2: NA      0   111.1
#>  3: NA      0   121.6
#>  4: NA      0   172.2
#>  5: NA      0   200.1
#>  6: NA      0   215.1
#>  7: NA      0   244.1
#>  8: NA      0   251.7
#>  9: NA      0   266.3
#> 10: NA      0   455.2
#> 11: NA      0   500.1
#> 12: NA      0   777.4
#> 13:  1      0   100.4
#> 14:  1      1   111.1
#> 15:  1      0   121.6

However, the approach with creating a data.frame from the vector seems rather verbose and I would rather not add the rows with id = NA . 但是,从向量创建data.frame的方法似乎很冗长,我宁愿不添加id = NA的行。 Is there a more neat way to combine a vector with a data.frame and complete it? 有没有更整洁的方法来将向量与data.frame结合起来并完成它?

Created on 2019-01-08 by the reprex package (v0.2.1) reprex软件包 (v0.2.1)创建于2019-01-08

A simple solution with data.table: 使用data.table的简单解决方案:

products <- c("100.4", "500.1", "200.1", "121.6", "251.7", "215.1", "172.2")

df <- setDT(df)
rbindlist(lapply(unique(df$id),function(ID){
  rbind(df[id == ID],data.table(product = products,id = ID, bought = 0))
}))

You could consider also merging the two data frame using that dtaa frame: 您也可以考虑使用该dtaa框架合并两个数据框架:

products <- data.frame(product = rep(products,each = length(unique(df$id))), 
                                     id = rep(unique(df$id),length(unique(products))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM