I am a Stata user trying to switch to R and having the usual beginner's struggle. I have been trying (and failing) to do a loop for a few days and I now surrender. What I want to do (in a loop):
start from a list of variable names
create a new variable
recode that new variable(s) based on the value of existing variables
possibly do so using the dplyr syntax, but this is not essential, only for consistency with the rest of my code.
Here is a stylised example of what I am trying to do. In my actual data, the xx and xy variables originate from the join function applied to 2 existing data frames.
N <- 1000
df <- data.frame(x1 = rnorm(N),
x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)
varlist <- c("x2","x3")
lapply(varlist, function(x) {
df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
})
When I run the lapply part of the code I get the error message
Error: unexpected '}' in: " df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing xx and xy }"
even though it should be expected... I am sure there a number of mistakes in my code, and that's partly because I am used to macros in Stata for which there is no direct equivalent in R. Anyway, if you can point me in the right direction it would be fantastic!
The reason your code doesn't work is that your paste0(x, ".y")
is literally pasting the x
with .y
. And that's it, you're not telling it to subset the data by that column.
What you actually should be doing is subsetting the data according to the column name that's generated by paste0(x, ".y")
. So for example, to get the column of data x2.y
you can go
df[, paste0(varlist[1], ".y")]
## and of course the same can be done for second item of varlist
# df[, paste0(varlist[2], ".y")]
Now we know how to subset columns by a variable name, and because you want to learn how to write it in a loop, we can replace the numbers in varlist[1]
(and varlist[2]
) with a 'looping' variable
Here are two ways to do it, one using a for loop, and the other using sapply
for(i in varlist){
df[, i] <- ifelse(df[, "x1"] < 0, df[, paste0(i, ".y")], df[, paste0(i, ".x")])
}
head(df)
# x1 x2.x x2.y x3.x x3.y x2 x3
# 1 -0.56047565 1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749 0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3 1.55870831 1.9820198 -2.5415892 1.551835 -2.3289958 1.9820198 1.551835
# 4 0.07050839 1.8678249 -0.7807724 2.302715 -4.2841578 1.8678249 2.302715
# 5 0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428 5.598490
# 6 1.71506499 3.0405735 -2.6152683 2.962585 -0.7946739 3.0405735 2.962585
You can also do this using an *apply
, and in this instance I'm using sapply
so that it 'simplifies' the result (whereas an lapply
would return lists)
df[, varlist] <- sapply(varlist, function(x){
ifelse(df[, "x1"] < 0, df[, paste0(x, ".y")], df[, paste0(x, ".x")])
})
head(df)
# x1 x2.x x2.y x3.x x3.y x2 x3
# 1 -0.56047565 1.0042013 -2.5116037 2.849693 -2.8034502 -2.5116037 -2.803450
# 2 -0.23017749 0.9600450 -1.7630621 2.672243 -2.3498868 -1.7630621 -2.349887
# 3 1.55870831 1.9820198 -2.5415892 1.551835 -2.3289958 1.9820198 1.551835
# 4 0.07050839 1.8678249 -0.7807724 2.302715 -4.2841578 1.8678249 2.302715
# 5 0.12928774 -0.5493428 -1.8258641 5.598490 -5.0261096 -0.5493428 5.598490
# 6 1.71506499 3.0405735 -2.6152683 2.962585 -0.7946739 3.0405735 2.962585
set.seed(123) ## setting the seed as we're sampling
N <- 1000
df <- data.frame(x1 = rnorm(N),
x2.x = rnorm(N)+2,x2.y = rnorm(N)-2,
x3.x = rnorm(N)+3,x3.y = rnorm(N)-3)
try this brother
replace mutate
by mutate_
https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html
This worked for me:
lapply(varlist, function(x)
df <- df %>% mutate(x = ifelse(x1 < 0, paste0(x,".y"),paste0(x,".x")) # generate varialble "x" values from existing x.x and x.y
))
You do not need the braces to designate a loop using lapply
. Read this for more info on lapply
syntax.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.