[英]How can I change the factors to numeric variables or otherwise deal with this error I'm getting in my linear regression
Trying to run a linear regression model with this dataset, mro.csv, but when I run lm() it gives the error message:尝试使用此数据集 mro.csv 运行线性回归模型,但是当我运行 lm() 时,它给出了错误消息:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
Not sure what parts of the dataset are factors and not numeric, all the data is numbers except column names.Also unsure what the '-' not meaningful for factors part is about because there are no -'s in the dataset either.不确定数据集的哪些部分是因子而不是数字,所有数据都是数字,列名除外。也不确定“-”对因子部分没有意义是什么,因为数据集中也没有 -。
Not sure how to share the dataset, but here's the csv in a google sheet: mro.csv不知道如何共享数据集,但这是谷歌表中的 csv: mro.csv
> raw <- read.csv("/Users/cpt.jack/Downloads/mro.csv",header<-F,sep<-",")
> colnames(raw)<- c("inlf","hours","kidslt6","kidsge6","age", "educ", "wage", "repwage", "hushrs", "husage", "huseduc","huswage", "faminc", "mtr", "motheduc", "fatheduc", "unem","city", "exper", "nwifeinc", "lwage", "expersq")
>
>
> dim(raw)
[1] 753 22
>
> set.seed(88)
> raw <- raw[sample(nrow(raw)),]
>
>
> raw1<-raw[raw$inlf==1,]
> dim(raw)
[1] 753 22
> dim(raw1)
[1] 428 22
>
>
> reg1 <- lm(wage~ hours + kidslt6 + kidsge6 + age + educ + hushrs + husage + huseduc + huswage
+mtr+motheduc+fatheduc+unem
+exper+nwifeinc, data=raw1)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> reg1 <- lm(wage~ hours,data=raw1)
wage
and lwage
are being read as factor
s because they contain the value "."
wage
和lwage
被视为factor
s,因为它们包含值"."
which can't be parsed as numeric.不能解析为数字。 This value can be handled manually.该值可以手动处理。
df <- read.csv(
"~/Downloads/mro.csv",
header = FALSE,
stringsAsFactors = FALSE,
col.names = c(
"inlf", "hours", "kidslt6", "kidsge6", "age", "educ", "wage",
"repwage", "hushrs", "husage", "huseduc", "huswage", "faminc",
"mtr", "motheduc", "fatheduc", "unem", "city", "exper",
"nwifeinc", "lwage", "expersq"
)
)
df$wage <- as.numeric(ifelse(df$wage == ".", 0, df$wage))
df$lwage <- as.numeric(ifelse(df$lwage == ".", 0, df$lwage))
Now the lm
should run without issues.现在lm
应该可以正常运行了。
df <- df[sample(nrow(df)), ]
df1 <- df[df$inlf == 1, ]
reg1 <- lm(
wage ~ hours + kidslt6 + kidsge6 + age + educ + hushrs + husage + huseduc +
huswage + mtr + motheduc + fatheduc + unem + exper + nwifeinc,
data = df1
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.