简体   繁体   中英

Loop to create dummy variable R

I am trying to generate dummy variables (must be 1/0) using a loop based on the most frequent response of a variable. After lots of googling, I haven't managed to come up with a solution. I have extracted the most frequent responses (strings, say the top 5 are "A","B",...,"E") using

top5<-names(head(sort(table(data$var1), decreasing = TRUE),5)

I would like the loop to check if another variable ("var2") equals A, if so set =1, OW =0, then give a summary using aggregate(). In Stata, I can refer to the looped variable i using `i' but not in R... The code that does not work is:

for(i in top5) {
   data$i.dummy <- ifelse(data$var2=="i",1,0)
   aggregate(data$i.dummy~data$age+data$year,data,mean)
}

Any suggestions?

If you want one column per item in your top 5 then I would use sapply along the elements in top5 . No need for ifelse because == compares and gives TRUE or 1 if the comparison is TRUE and 0 otherwise

Here we cbind a matrix of 5 columns, one each for each element of top5 containing 1 if the row in data$var2 equals the respective element of 'top5':

data <- cbind( data , sapply( top5 , function(x) as.integer( data$var2 == x ) ) )

If you want one column for matches of any of top5 it's even easier:

data$dummies <- as.integer( data$var2 %in% top5 )

The as.integer() in both cases is used to turn TRUE or FALSE to 1 and 0 respectively.

A cut down example to illustrate how it works:

set.seed(123)
top2 <- c("A","B")
data <- data.frame( var2 = sample(LETTERS[1:4],6,repl=TRUE) )

#  Make dummy variables, one column for each element in topX vector
data <- cbind( data , sapply( top2 , function(x) as.integer( data$var2 == x ) ) )
data
#  var2 A B
#1    B 0 1
#2    D 0 0
#3    B 0 1
#4    D 0 0
#5    D 0 0
#6    A 1 0

#  Make single column for all elements in topX vector
data$ANY <- as.integer( data$var2 %in% top2 )
data
#  var2 ANY A B
#1    B   1 0 1
#2    D   0 0 0
#3    B   1 0 1
#4    D   0 0 0
#5    D   0 0 0
#6    A   1 1 0

See fortune(312) , then read the help ?"[[" and possibly the help for paste0 .

Then possibly consider using other tools like model.matrix and sapply rather than doing everything yourself using loops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM