简体   繁体   English

如何在R中创建一个填充了1和0的表,以显示来自另一个表的值的存在?

[英]How to create a table in R populated with 1s and 0s to show presence of values from another table?

I'm working with data regarding people and what class of medicine they were prescribed. 我正在处理有关人员以及他们开什么药的数据。 It looks something like this (the actual data is read in via txt file): 看起来像这样(实际数据通过txt文件读取):

test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.table(test)
test <- unique(test[, 1:2])
test

The table has about 5 million rows, 45k unique patients, and 49 unique medicines. 该表约有500万行,4.5万名独特患者和49种独特药物。 Some patients have multiples of the same medicines, which I remove. 有些患者有多种相同的药物,我将其删除。 Not all patients have every medicine. 并非所有患者都有每种药物。 I want to make each of the 49 unique medicines into separate columns, and have each unique patient be a row, and populate the table with 1s and 0s to show if the patient has the medicine or not. 我想将49种独特的药物中的每一种划分为单独的列,并让每个独特的患者排成一行,并在表格中填充1和0,以显示患者是否有药物。

I was trying to use spread or dcast, but there's no value column. 我试图使用传播或dcast,但没有值列。 I tried to amend this by adding a row of 1s 我试图通过添加1来修正此问题

test$true <- rep(1, nrow(test))

And then using tidyr 然后用提迪尔

library(tidyr)
test_wide <- spread(test, med, true, fill = 0)

My original data produced this error but I'm not sure why the new data isn't reproducing it... 我的原始数据产生了此错误,但是我不确定为什么新数据无法再现...

Error: `var` must evaluate to a single number or a column name, not a list

Please let me know what I can do to make this a better reproducible example sorry I'm really new to this. 请让我知道我可以做些什么来使它成为更好的可重现示例,对不起,我真的是新来的。

Another solution using dplyr 使用dplyr另一种解决方案

library(dplyr)
test %>% group_by(id) %>% table()

It looks like you are trying to do onehot encoding here. 看来您正在尝试在此处进行onehot编码。 For this please refer to the "onehot" package. 为此,请参考“ onehot”软件包。 Details are here . 详细信息在这里

Code for reference: 参考代码:

library(onehot)
test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.frame(test)

str(test)
test$id <- as.numeric(test$id)
str(test)
encoder <- onehot(test)
finaldata <- predict(encoder,test)
finaldata

Make sure that all the columns that you want to be encoded are of the type factor . 确保要编码的所有列均为type factor Also, I have taken the liberty of changing data.table to data.frame . 另外,我也自由地将data.table更改为data.frame

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM