简体   繁体   English

R中的多元线性回归模型

[英]Multiple linear regression model in R

I'm trying to create a multiple linear regression model with this data:我正在尝试使用此数据创建多元线性回归模型:

    bweight   gestwks            hyp sex    

1    2974 38.5200004577637       0 female          
2    3270 NA                     0 male            
3    2620 38.150001525878899     0 female          
4    3751 39.799999237060497     0 male            
5    3200 38.889999389648402     1 male           
6    3673 40.970001220703097     0 female          

In order to consider the string type arguments "male" and "female", I convert them to integers 1 and 0, like this :为了考虑字符串类型参数“male”和“female”,我将它们转换为整数 1 和 0,如下所示:

male = 1*(sex == "male")

So, creating the linear model, where babyweight is the outcome variable:因此,创建线性模型,其中婴儿体重是结果变量:

lm2 = lm(bweight ~ gestwks + hyp + male)

But then when I'd like to see the parameters of the model, I get this(not the whole output is included here):但是当我想查看模型的参数时,我得到了这个(这里不包括整个输出):

Call:
lm(formula = bweight ~ gestwks + 
    hyp + male)

Coefficients:
                              (Intercept)  gestwks26.950000762939499  
                                  864.000                                   -236.000  
gestwks27.329999923706101    gestwks27.9899997711182  
                                    7.363                                    146.469  
gestwks28.040000915527301   gestwks30.5200004577637  
                                  184.469                                    760.469  
gestwks30.649999618530298  gestwks30.709999084472699  
                                  900.000                                   -141.531

And I'm supposed to be getting only one pair of parameters.我应该只得到一对参数。 What am I doing wrong?我究竟做错了什么?

Before conducting any analysis, always explore your variables carefully.在进行任何分析之前,请务必仔细探索您的变量。 Pay attention to ranges and distributions for continuous variables and frequencies for categorical ones.注意连续变量的范围和分布以及分类变量的频率。 Do this after importing the data.导入数据后执行此操作。

In this case, the gestwks variable is not actually numeric.在这种情况下, gestwks变量实际上不是数字。 If you had looked at the output of str(my_data) , where my_data is the name of your data frame, then you would have seen the potential problem with that variable.如果您查看了str(my_data)的输出,其中my_data是数据框的名称,那么您就会看到该变量的潜在问题。 You probably need to revise the command to import the data.您可能需要修改命令以导入数据。 If it is correct, then you'll need to convert the variable into a numeric one using the appropriate command.如果它是正确的,那么您需要使用适当的命令将变量转换为数字变量。 Read the Warning in the help page of as.numeric .*阅读的帮助页面警告as.numeric 。*

Data management is a key part of your analysis.数据管理是分析的关键部分。

Look carefully at gestwks for strange looking values.仔细查看gestwks以获取奇怪的值。 table can help if there aren't too many records, or look at the first and last few sorted values.如果没有太多记录, table可以提供帮助,或者查看第一个和最后几个排序值。

* as.numeric (levels (f))[f] or as.numeric (as.character (f)) is the recommended command. * as.numeric (levels (f))[f]as.numeric (as.character (f))是推荐的命令。

gestwks 是一个因素,您需要在回归之前将其转换为as.numeric

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM