I have a data frame that's ~ 50,000 X 200. The column names are 4 different types with numbers on the end ranging from 1-50 (store1, price1, time1, rate1, store2, price2, time2, rate2,...,store50, price50, time50, rate50). I'm trying to create dummy variables depending on the values of each column but am having trouble getting R to handle the column names inside a loop.
store1 price1 time1 rate1 store2 price2 time2 rate2 ....
A 55.55 08:09 1.44 B 44.44 11:09 1.46
C 55.55 08:09 1.44 G 44.44 11:09 1.46
X 55.55 08:09 1.44 E 44.44 11:09 1.46
D 55.55 08:09 1.44 S 44.44 11:09 1.46
Here's what I have tried so far with no luck.
xform_data <- function(x) {
for(i in 1:50){
storeX <- (paste("store",i,sep=""))
storeX2 <- ifelse(storeX == "A", 1, 2)
x <- cbind(x, storeX2 )
}
x
}
Any suggestions?
The following compares the name instead of the comparing the value:
ifelse(storeX == "A", ...
Try:
ifelse(x[,storeX] == "A", ...
Also, all the new columns will be called storeX2
. You might prefer to rename them:
x <- cbind(x, storeX2)
colnames(x)[length(colnames(x))] <- storeX2
(I am sure there exist more elegant ways to do it.)
@aix gave the proper way to do this with a loop, however you may find it quicker or easier to use some other tools, depending on what you want your final result to be. Functions like sapply
and lapply
can be used to process every column of a data frame (or subset of a data frame) the same way. The model.matrix
function will convert variables into dummy variables (0's and 1's) in one step. Other tools that may help include factors, switch
, and match
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.