简体   繁体   中英

How can I change the factors to numeric variables or otherwise deal with this error I'm getting in my linear regression

Trying to run a linear regression model with this dataset, mro.csv, but when I run lm() it gives the error message:

1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors

Not sure what parts of the dataset are factors and not numeric, all the data is numbers except column names.Also unsure what the '-' not meaningful for factors part is about because there are no -'s in the dataset either.

Not sure how to share the dataset, but here's the csv in a google sheet: mro.csv

> raw <- read.csv("/Users/cpt.jack/Downloads/mro.csv",header<-F,sep<-",") 
> colnames(raw)<- c("inlf","hours","kidslt6","kidsge6","age", "educ",  "wage", "repwage",             "hushrs", "husage", "huseduc","huswage",  "faminc",  "mtr",  "motheduc",  "fatheduc",    "unem","city", "exper",  "nwifeinc",  "lwage",  "expersq")  
> 
> 
> dim(raw)
[1] 753  22
> 
> set.seed(88)
> raw  <- raw[sample(nrow(raw)),]
> 
> 
> raw1<-raw[raw$inlf==1,]
> dim(raw)
[1] 753  22
> dim(raw1)
[1] 428  22
> 
> 
> reg1 <- lm(wage~ hours + kidslt6 + kidsge6 + age + educ + hushrs + husage + huseduc + huswage
+mtr+motheduc+fatheduc+unem
+exper+nwifeinc, data=raw1)

Warning messages:

1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored

2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> reg1 <- lm(wage~ hours,data=raw1)

wage and lwage are being read as factor s because they contain the value "." which can't be parsed as numeric. This value can be handled manually.

df <- read.csv(
  "~/Downloads/mro.csv",
  header = FALSE,
  stringsAsFactors = FALSE,
  col.names = c(
    "inlf", "hours", "kidslt6", "kidsge6", "age", "educ",  "wage",
    "repwage", "hushrs", "husage", "huseduc", "huswage",  "faminc",
    "mtr",  "motheduc",  "fatheduc", "unem", "city", "exper",
    "nwifeinc", "lwage", "expersq"
  )
)

df$wage <- as.numeric(ifelse(df$wage == ".", 0, df$wage))
df$lwage <- as.numeric(ifelse(df$lwage == ".", 0, df$lwage))

Now the lm should run without issues.

df <- df[sample(nrow(df)), ]
df1 <- df[df$inlf == 1, ]

reg1 <- lm(
  wage ~ hours + kidslt6 + kidsge6 + age + educ + hushrs + husage + huseduc +
         huswage + mtr + motheduc + fatheduc + unem + exper + nwifeinc,
  data = df1
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM