简体   繁体   中英

R linear regression issue : lm.fit(x, y, offset = offset, singular.ok = singular.ok, …)

I try a regression with R. I have the following code with no problem in importing the CSV file

    dat <- read.csv('http://pastebin.com/raw.php?i=EWsLjKNN',sep=";")
dat # OK Works fine
Regdata <- lm(Y~.,na.action=na.omit, data=dat)
summary(Regdata)

However when I try a regression it's not working. I get an error message:

Erreur dans lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  aucun cas ne contient autre chose que des valeurs manquantes (NA)

All my CSV file are numbers and if a "cell" is empty I have the "NA" value. Some column are not empty and some other row are sometimes empty witht the NA value...

So, I don't understand why I get an error message even with :

na.action=na.omit

PS:Data of the CSV are available at: http://pastebin.com/EWsLjKNN

You get this error message because all your data frame rows contain al least one missing value. It can be checked for example with this code:

 apply(data,1,function(x) sum(is.na(x)))
 [1] 128 126  82  78  73  65  58  34  31  30  28  30  20  21  12  20  17  16  12  42  50 128

So when you run regression wit lm() and na.action=na.omit all lines of data frame are removed and there are no data to fit regression.

But this is not the main problem. If your provided data contains all information you have, then you are trying to apply regression with 165 independent variables (X variables) while having only 22 observations. Number of independent variables have to be less than number of observations.

I believe I can add a little clarity to this since I personally experienced this, and that's why I am here-except my issue was with the gls (genearlized least squares model) vs. the standard linaer model. Some like logic "might" apply here-or in a like situation.

I don't refute anything that anyone has said thus far. There might be some confusion with what people percieve as an observation, and the way R percieves these things.

Say you have 160+ independent variables. Say you have a single given source in which all your data comes from. You import it from a file, database, etc. Say you have an identical amount of response variables or something that satisfies R for your purpose of regression analysis.

R will tell you that you have 2 observations. Now, if you have like data obtained in the same exact manner from another source, you have 3 observations if you look in RStudio at your global environment.

The reason I mention this is because the term "observation" in the mathematical sense (as it's being used here) is completely acceptable. In the terms of R, it views an observation in more ways than one.

THAT was a big contributor to a problem I had of like kind-and it told me I had values missing, na.omit this, na.action that, etc. WHen I looked at the OrchardSpray demo, and I reviewed my own methodologies, I figured it out.

The point being is that how we percieve an "observation" in datum is one thing. R has another term for it, and the way it spits out error messages can cause additional confusion.

See what I mean?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM