简体   繁体   中英

looping over variable names in R

I am a recent convert to R and am struggling to find the R equivalent of the following: looping over variables named with a common prefix plus a number (var1, var2, ..., varn).

Say I have a dataset where each row is a store and each column is the value of that store's revenue in month 1, month 2...month 6. Some made-up data for example:

store = c("a", "b", "c", "d", "c")
rev1 = c(500, 200, 600, 400, 1200) 
rev2 = c(260, 100, 450, 45, 1300)
rev3 = c(500, 150, 610, 350, 900)
rev4 = c(480, 200, 600, 750, 1000)
rev5 = c(500, 68, 750, 350, 1200)
rev6 = c(510, 80, 1000, 400, 1450)
df = data.frame(store, rev1, rev2, rev3, rev4, rev5, rev6) 

I am trying to do something like the following:

varlist <- paste("rev", 1:6)  #create list of variables rev1-rev6 #
for i in varlist {
      highrev[i] <- ifelse(rev[i] > 500, 1, 0) 
}

So for each existing variable rev1:rev6, create a variable highrev1:highrev6 which equals 1 if rev1:rev6 > 500 and 0 otherwise.

Can you suggest an appropriate means of doing this?

In R, we usually don't use loops for such operations. You could simply do:

df[paste0("highrev", 1:6)] <- (df[paste0("rev", 1:6)] > 500) + 0
df
#   store rev1 rev2 rev3 rev4 rev5 rev6 highrev1 highrev2 highrev3 highrev4 highrev5 highrev6
# 1     a  500  260  500  480  500  510        0        0        0        0        0        1
# 2     b  200  100  150  200   68   80        0        0        0        0        0        0
# 3     c  600  450  610  600  750 1000        1        0        1        1        1        1
# 4     d  400   45  350  750  350  400        0        0        0        1        0        0
# 5     c 1200 1300  900 1000 1200 1450        1        1        1        1        1        1

setup

varlist  <- paste0("rev",1:6)      # note that this is paste0, not paste
hvarlist <- paste0("hi",varlist)

data.table solution. There is a nice way to do this in data.table :

require(data.table)
setDT(df)[,(hvarlist):=lapply(.SD,function(x)1L*(x>500)),.SDcols=varlist]
#    store rev1 rev2 rev3 rev4 rev5 rev6 hirev1 hirev2 hirev3 hirev4 hirev5 hirev6
# 1:     a  500  260  500  480  500  510      0      0      0      0      0      1
# 2:     b  200  100  150  200   68   80      0      0      0      0      0      0
# 3:     c  600  450  610  600  750 1000      1      0      1      1      1      1
# 4:     d  400   45  350  750  350  400      0      0      0      1      0      0
# 5:     c 1200 1300  900 1000 1200 1450      1      1      1      1      1      1

The dplyr package is also designed with this sort of operation in mind... but simply cannot do it .


A bad alternative. Here's another way, hewing closely to the OP's loop:

within(df,{for(i in 1:6) assign(hvarlist[i],1L*(get(varlist[i]) > 500));rm(i)})
#   store rev1 rev2 rev3 rev4 rev5 rev6 hirev6 hirev5 hirev4 hirev3 hirev2 hirev1
# 1     a  500  260  500  480  500  510      1      0      0      0      0      0
# 2     b  200  100  150  200   68   80      0      0      0      0      0      0
# 3     c  600  450  610  600  750 1000      1      1      1      1      0      1
# 4     d  400   45  350  750  350  400      0      0      1      0      0      0
# 5     c 1200 1300  900 1000 1200 1450      1      1      1      1      1      1

You can't assign to dynamic variable names with hvarlist[i] <- ... ; this is done instead with assign(hvarlist[i],...) , but using the latter is not a good habit. Similarly, get must be used to grab a variable on the basis of a string containing its name.

If you want to keep the loop, you could try this

store = c("a", "b", "c", "d", "c")
rev1 = c(500, 200, 600, 400, 1200) 
rev2 = c(260, 100, 450, 45, 1300)
rev3 = c(500, 150, 610, 350, 900)
rev4 = c(480, 200, 600, 750, 1000)
rev5 = c(500, 68, 750, 350, 1200)
rev6 = c(510, 80, 1000, 400, 1450)
df = data.frame(store, rev1, rev2, rev3, rev4, rev5, rev6)

You don't need the ifelse like David points out since > is vectorized and will work on the entire data frame

df[, -1] > 500

#       rev1  rev2  rev3  rev4  rev5  rev6
# [1,] FALSE FALSE FALSE FALSE FALSE  TRUE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE
# [3,]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [4,] FALSE FALSE FALSE  TRUE FALSE FALSE
# [5,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Here is your loop slightly amended

for (i in 1:6) {
  x <- paste0('rev', i)
  y <- paste0('highrev', i)
  df[, y] <- (df[, x] > 500) + 0L
}

#   store rev1 rev2 rev3 rev4 rev5 rev6 highrev1 highrev2 highrev3 highrev4 highrev5 highrev6
# 1     a  500  260  500  480  500  510        0        0        0        0        0        1
# 2     b  200  100  150  200   68   80        0        0        0        0        0        0
# 3     c  600  450  610  600  750 1000        1        0        1        1        1        1
# 4     d  400   45  350  750  350  400        0        0        0        1        0        0
# 5     c 1200 1300  900 1000 1200 1450        1        1        1        1        1        1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM