简体   繁体   中英

I've intended attach my linear regression result by for loop but got an error. How can I solve it?

I'm a rookie with R. I have a question.
I need test all gene expression values (dat[,28:63] - numeric) , according to various clinical variables (dat[,1:27] - factor) . My initial code was

dat <- readRDS("TCGA GLUT data.rds")
str(dat)

a <- round(summary(lm(SLC2A1 ~ Gender, data=dat))$coefficients, 5)
b <- round(summary(lm(SLC2A1 ~ Race, data=dat))$coefficients, 5)
c <- round(summary(lm(SLC2A1 ~ Age_Dx, data=dat))$coefficients, 5)
d <- round(summary(lm(SLC2A1 ~ Recurrence, data=dat))$coefficients, 5)
e <- round(summary(lm(SLC2A1 ~ Vital_Status, data=dat))$coefficients, 5)
f <- round(summary(lm(SLC2A1 ~ Hashimoto, data=dat))$coefficients, 5)
g <- round(summary(lm(SLC2A1 ~ Histologic_Dx, data=dat))$coefficients, 5)
h <- round(summary(lm(SLC2A1 ~ Max_Size, data=dat))$coefficients, 5)    
i <- round(summary(lm(SLC2A1 ~ Metastatic_LN, data=dat))$coefficients, 5)
j <- round(summary(lm(SLC2A1 ~ ETE, data=dat))$coefficients, 5)
k <- round(summary(lm(SLC2A1 ~ T_stage, data=dat))$coefficients, 5)
l <- round(summary(lm(SLC2A1 ~ N_stage, data=dat))$coefficients, 5)
m <- round(summary(lm(SLC2A1 ~ Stage, data=dat))$coefficients, 5)
n <- round(summary(lm(SLC2A1 ~ BRAF_V600E, data=dat))$coefficients, 5)

SLC2A1.result <- rbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n)
SLC2A1.result

This job is so hard work which was changing all the gene name manually (SLC2A1 -> SLC2A2 -> SLC2A3...) So I've made a for loop like this.

result <- data.frame()
for (i in 28:63){
 a <- summary(lm(dat[,i] ~ Gender, data=dat))$coefficients
 b <- summary(lm(dat[,i] ~ Race, data=dat))$coefficients
 c <- summary(lm(dat[,i] ~ Age_Dx, data=dat))$coefficients
 d <- summary(lm(dat[,i] ~ Recurrence, data=dat))$coefficients
 e <- summary(lm(dat[,i] ~ Vital_Status, data=dat))$coefficients
 f <- summary(lm(dat[,i] ~ Hashimoto, data=dat))$coefficients
 g <- summary(lm(dat[,i] ~ Histologic_Dx, data=dat))$coefficients
 h <- summary(lm(dat[,i] ~ Max_Size, data=dat))$coefficients     
 i <- summary(lm(dat[,i] ~ Metastatic_LN, data=dat))$coefficients
 j <- summary(lm(dat[,i] ~ ETE, data=dat))$coefficients
 k <- summary(lm(dat[,i] ~ T_stage, data=dat))$coefficients
 l <- summary(lm(dat[,i] ~ N_stage, data=dat))$coefficients
 m <- summary(lm(dat[,i] ~ Stage, data=dat))$coefficients
 n <- summary(lm(dat[,i] ~ BRAF_V600E, data=dat))$coefficients 
 result[i] <- rbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n)
 }

However, I got an error.

Error in `[.data.frame`(dat, , i) : undefined columns selected

I can't realized that where is my error and how can I solve it. Please help me!!

You should understand that summary(lm(...))$coefficients is a 2x4 matrix. So the rbind(a,b,c,...) in your code builds a 28x4 matrix. Then, if you write result[i] <- rbind(a,b,c,...) you are assigning a matrix to the i -th column of your result data.frame .

I would advise that you create a matrix for every gene, like you did in your first example and build a list of matrices for every gene. You could then assign names to the list indices corresponding to the names of your genes. This would result in code like the following.

result <- list()
offset <- 27
for (i in 28:63){
  a <- summary(lm(dat[,i] ~ Gender, data=dat))$coefficients
  b <- summary(lm(dat[,i] ~ Race, data=dat))$coefficients
  c <- summary(lm(dat[,i] ~ Age_Dx, data=dat))$coefficients
  d <- summary(lm(dat[,i] ~ Recurrence, data=dat))$coefficients
  # more...
  gene.mat <- rbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n)
  result[[i - offset]] <- round(gene.mat, 5)
}
# name the indices by creating a character vector "SLC2A1", "SLC2A2", ...
names(result) <- paste0("SLC2A", 1:36)

Then you can access a matrix by using result$SLC2A1 for example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM