I want to delete correlated variables and perform lasso regression on multiple datasets. So i divided my data in two lists: first list contains variables and the second contains targets.
I want also to divide my data into train and test before applying Lasso, making predictions and store tthe results in a final dataframe.
The main steps:
1- Correlation: (delete correlated variables)
2- divide data inton train and test
3- Perform LASSO
4- Make predictions
5- store predictions in a dataframe with their labels
Thanks!
set.seed(99)
library("caret")
# Create data frames
H <- data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
C <- data.frame(replicate(5,sample(0:100,10,rep=FALSE)))
R <- data.frame(replicate(7,sample(0:30,10,rep=TRUE)))
E <- data.frame(replicate(4,sample(0:40,10,rep=FALSE)))
# Create target variables
Y_H <- data.frame(replicate(1,sample(20:35, 10, rep = TRUE)))
Y_H
names(Y_H)<-names(Y_H)[names(Y_H)=="replicate.1..sample.20.35..10..rep...TRUE.."] <-"label_1"
Y_C <- data.frame(replicate(1,sample(15:65, 10, rep = TRUE)))
names(Y_C) <- names(Y_C)[names(Y_C)=="replicate.1..sample.15.65..10..rep...TRUE.."] <-"label_2"
Y_R <- data.frame(replicate(1,sample(25:45, 10, rep = TRUE)))
names(Y_R) <-names(Y_R)[names(Y_R) == "replicate.1..sample.25.45..10..rep...TRUE.."] <- "label_3"
Y_E <- data.frame(replicate(1,sample(21:80, 10, rep = TRUE)))
names(Y_E) <-names(Y_E)[names(Y_E) == "replicate.1..sample.15.65..10..rep...TRUE.."] <- "label_4"
# Store observations and targets in lists
inputs <- list(H, C, R, E)
targets <- list(Y_H, Y_C, Y_R, Y_E)
# Perform correlation
outputs <- list()
for(df in inputs){
data.cor <- cor(df)
high.cor <- findCorrelation(data.cor, cutoff=0.40)
outputs <- append(outputs, list(df[,-high.cor]))
}
library("glmnet")
lasso_cv <- list()
lasso_model <- list()
for(i in outputs){
for(j in targets){
lasso_cv[i] <- cv.glmnet(as.matrix(outputs[[i]]), as.matrix(targets[[j]]), standardize = TRUE, type.measure="mse", alpha = 1,nfolds = 3)
lasso_model[i] <- glmnet(as.matrix(outputs[[i]]), as.matrix(targets[[j]]),lambda = lasso_cv[i]$lambda_cv, alpha = 1, standardize = TRUE)
}
}
When i run my for loop, it gives this error:
Error in h(simpleError(msg, call)) :
erreur d'�valuation de l'argument 'x' lors de la s�lection d'une
m�thode pour la fonction 'as.matrix' : invalid subscript type 'list'
It seems to me that the error is in the range of the last for loop.
You wrote for(i in outputs)
, and then used as.matrix(outputs[[i]])
. So, at the first iteration you are basically calling as.matrix(outputs[[outputs[[1]])
, which does not make sense. Similar reasoning applies to for(j in targets)
.
Try to replace the code I quoted by for(i in seq_len(length(outputs)))
and for(i in seq_len(length(targets)))
. That should work. In this way, at the first iteration as.matrix(outputs[[i]])
translates to as.matrix(outputs[[1]])
, and similarly for the other line, which it seems to me is the idea you were looking for.
Ps I am not sure about your code. If we check, lasso_cv[i]$lambda_cv
returns NULL
for every i. Maybe you can check into it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.