简体   繁体   中英

R : Error in linear regression model

I have a 2 different data frames for which i would like to perform linear regression

I have written following code for it

mydir<- "/media/dev/Daten/Task1/subject1/t1"
#multiple subject paths should be given here
# read full paths
myfiles<- list.files(mydir,pattern = "regional_vol*",full.names=T)
# initialise the dataframe from first file 

df<- read.table( myfiles[1], header = F,row.names = NULL, skip = 3, nrows = 1,sep = "\t") 
# [-c(1:3),]
df
#read all the other files and update dataframe
#we read 4 lines to read the header correctly, then remove 3 
ans<- lapply(myfiles[-1], function(x){  read.table( x, header = F, skip = 3, nrows = 1,sep = "\t")       })
ans
#update dataframe
#[-c(1:3),]
lapply(ans, function(x){df<<-rbind(df,x)}  )
#this should be the required dataframe

uncorrect<- array(df)

# Linear regression of ICV extracted from global size FSL 
# Location where your icv is located
ICVdir <- "/media/dev/Daten/Task1/T1_Images"
#loding csv file from ICV
mycsv  <- list.files(ICVdir,pattern = "*.csv",full.names = T )
af<- read.csv(file = mycsv,header = TRUE)
ICV<- as.data.frame(af[,2],drop=FALSE)
#af[1,]
#we take into consideration second column  of csv
#finalcsv <-lapply(mycsv[-1],fudnction(x){read.csv(file="global_size_FSL")})
subj1<- as.data.frame(rep(0.824,each=304))

plot(df ~ subj1, data = df,
       xlab = "ICV value of each subject",
       ylab = "Original uncorrected volume",
       main="intercept calculation"
       )

fit <- lm(subj1 ~ df )

The data frame df has 304 values in following format

6433 6433     
1430 1430     
1941 1941     
3059 3059     
3932 3932     
6851 6851

and another data frame Subj1 has 304 values in following format

0.824     
0.824     
0.824      
0.824     
0.824

When i run my code i am incurring following error

Error in model.frame.default(formula = subj1 ~ df, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'subj1'

any suggestions why the data.frame values from variable subj1 are invalid

As mentioned, you are trying to give a data.frame as an independent variable. Try:

 fit <- lm(subj1 ~ ., data=df )

This will use all variables in the data frame, as long as subj1 is the dependent variable's name, and not a data frame by itself.

If df has two columns which are the predictors, and subj1 is the predicted (dependent) variable, combing the two, give them proper column names, and create the model in the format above.

Something like:

data <- cbind(df, subj1)
names(data) <- c("var1", "var2", "subj1")
fit <- lm(subj1 ~ var1 + var2, data=df )

Edit: some pointers:

  1. make sure you use a single data frame that holds all of your independent variables, and your dependent variable.
  2. The number of rows should be equal.
  3. If an independent variable in a constant, it has no variance for different values of the dependent variable, and so will have no meaning. If the dependent variable is a constant, there is no point for regressing - we can predict the value with 100% accuracy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM