I'm working with data regarding people and what class of medicine they were prescribed. It looks something like this (the actual data is read in via txt file):
test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.table(test)
test <- unique(test[, 1:2])
test
The table has about 5 million rows, 45k unique patients, and 49 unique medicines. Some patients have multiples of the same medicines, which I remove. Not all patients have every medicine. I want to make each of the 49 unique medicines into separate columns, and have each unique patient be a row, and populate the table with 1s and 0s to show if the patient has the medicine or not.
I was trying to use spread or dcast, but there's no value column. I tried to amend this by adding a row of 1s
test$true <- rep(1, nrow(test))
And then using tidyr
library(tidyr)
test_wide <- spread(test, med, true, fill = 0)
My original data produced this error but I'm not sure why the new data isn't reproducing it...
Error: `var` must evaluate to a single number or a column name, not a list
Please let me know what I can do to make this a better reproducible example sorry I'm really new to this.
Another solution using dplyr
library(dplyr)
test %>% group_by(id) %>% table()
It looks like you are trying to do onehot encoding here. For this please refer to the "onehot" package. Details are here .
Code for reference:
library(onehot)
test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.frame(test)
str(test)
test$id <- as.numeric(test$id)
str(test)
encoder <- onehot(test)
finaldata <- predict(encoder,test)
finaldata
Make sure that all the columns that you want to be encoded are of the type factor
. Also, I have taken the liberty of changing data.table
to data.frame
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.