简体   繁体   中英

R: looping through column names

I am a Stata user trying to switch to R and having the usual beginner's struggle. I have been trying (and failing) to do a loop for a few days and I now surrender. What I want to do (in a loop):

  • start from a list of variable names

  • create a new variable

  • recode that new variable(s) based on the value of existing variables

  • possibly do so using the dplyr syntax, but this is not essential, only for consistency with the rest of my code.

Here is a stylised example of what I am trying to do. In my actual data, the xx and xy variables originate from the join function applied to 2 existing data frames.

N <- 1000
  df  <- data.frame(x1 = rnorm(N),
x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)

varlist <- c("x2","x3")
lapply(varlist, function(x) {
   df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
  })

When I run the lapply part of the code I get the error message

Error: unexpected '}' in: " df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing xx and xy }"

even though it should be expected... I am sure there a number of mistakes in my code, and that's partly because I am used to macros in Stata for which there is no direct equivalent in R. Anyway, if you can point me in the right direction it would be fantastic!

The reason your code doesn't work is that your paste0(x, ".y") is literally pasting the x with .y . And that's it, you're not telling it to subset the data by that column.

What you actually should be doing is subsetting the data according to the column name that's generated by paste0(x, ".y") . So for example, to get the column of data x2.y you can go

df[, paste0(varlist[1], ".y")]
## and of course the same can be done for second item of varlist
# df[, paste0(varlist[2], ".y")]

Now we know how to subset columns by a variable name, and because you want to learn how to write it in a loop, we can replace the numbers in varlist[1] (and varlist[2] ) with a 'looping' variable

Here are two ways to do it, one using a for loop, and the other using sapply

For loop

for(i in varlist){
  df[, i] <- ifelse(df[, "x1"] < 0, df[, paste0(i, ".y")], df[, paste0(i, ".x")])
}

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585

sapply

You can also do this using an *apply , and in this instance I'm using sapply so that it 'simplifies' the result (whereas an lapply would return lists)

df[, varlist] <- sapply(varlist, function(x){
   ifelse(df[, "x1"] < 0, df[, paste0(x, ".y")], df[, paste0(x, ".x")])
})

head(df)
#            x1       x2.x       x2.y     x3.x       x3.y         x2        x3
# 1 -0.56047565  1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749  0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3  1.55870831  1.9820198 -2.5415892 1.551835 -2.3289958  1.9820198  1.551835
# 4  0.07050839  1.8678249 -0.7807724 2.302715 -4.2841578  1.8678249  2.302715
# 5  0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428  5.598490
# 6  1.71506499  3.0405735 -2.6152683 2.962585 -0.7946739  3.0405735  2.962585

Data

set.seed(123)   ## setting the seed as we're sampling
N <- 1000
df  <- data.frame(x1 = rnorm(N),
                  x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
                  x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)

try this brother

replace mutate by mutate_

https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

This worked for me:

lapply(varlist, function(x) 
  df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
))

You do not need the braces to designate a loop using lapply . Read this for more info on lapply syntax.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM