简体   繁体   中英

Multiple linear regression model in R

I'm trying to create a multiple linear regression model with this data:

    bweight   gestwks            hyp sex    

1    2974 38.5200004577637       0 female          
2    3270 NA                     0 male            
3    2620 38.150001525878899     0 female          
4    3751 39.799999237060497     0 male            
5    3200 38.889999389648402     1 male           
6    3673 40.970001220703097     0 female          

In order to consider the string type arguments "male" and "female", I convert them to integers 1 and 0, like this :

male = 1*(sex == "male")

So, creating the linear model, where babyweight is the outcome variable:

lm2 = lm(bweight ~ gestwks + hyp + male)

But then when I'd like to see the parameters of the model, I get this(not the whole output is included here):

Call:
lm(formula = bweight ~ gestwks + 
    hyp + male)

Coefficients:
                              (Intercept)  gestwks26.950000762939499  
                                  864.000                                   -236.000  
gestwks27.329999923706101    gestwks27.9899997711182  
                                    7.363                                    146.469  
gestwks28.040000915527301   gestwks30.5200004577637  
                                  184.469                                    760.469  
gestwks30.649999618530298  gestwks30.709999084472699  
                                  900.000                                   -141.531

And I'm supposed to be getting only one pair of parameters. What am I doing wrong?

Before conducting any analysis, always explore your variables carefully. Pay attention to ranges and distributions for continuous variables and frequencies for categorical ones. Do this after importing the data.

In this case, the gestwks variable is not actually numeric. If you had looked at the output of str(my_data) , where my_data is the name of your data frame, then you would have seen the potential problem with that variable. You probably need to revise the command to import the data. If it is correct, then you'll need to convert the variable into a numeric one using the appropriate command. Read the Warning in the help page of as.numeric .*

Data management is a key part of your analysis.

Look carefully at gestwks for strange looking values. table can help if there aren't too many records, or look at the first and last few sorted values.

* as.numeric (levels (f))[f] or as.numeric (as.character (f)) is the recommended command.

gestwks 是一个因素,您需要在回归之前将其转换为as.numeric

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM