简体   繁体   中英

Logistic regression detection probability

I'm attempting to access the key covariates in detection probability.

I'm currently using this code

    model1 <- glm(P ~ Width +
                MBL +
                DFT +
                SGP +
                SGC +
                Depth,
              family = binomial("logit"), 
              data = dframe2, na.action = na.exclude)
summary.lm(model1)

my data is structured like this-

Site Transect Q  ID   P  Width DFT  Depth    Substrate SGP SGC  MBL
1      Vr1    Q1  1   0    NA  NA   0.5         Sand   0   0    0.00000
2      Vr1    Q2  2   0    NA  NA   1.4 Sand&Searass   1   30   19.14286
3      Vr1    Q3  3   0    NA  NA   1.7 Sand&Searass   1   15   16.00000
4      Vr1    Q4  4   1    17   0   2.0 Sand&Searass   1   95   35.00000
5      Vr1    Q5  5   0    NA  NA   2.4         Sand   0   0    0.00000
6      Vr1    Q6  6   0    NA  NA   2.9 Sand&Searass   1   50   24.85714

My sample size is really small (n=12) and I only have ~70 rows of data.

when I run the code it returns

                      Estimate   Std. Error  t value Pr(>|t|)   
(Intercept)            2.457e+01  4.519e+00   5.437  0.00555 **
Width                  1.810e-08  1.641e-01   0.000  1.00000   
MBL                   -2.827e-08  9.906e-02   0.000  1.00000   
DFT                    2.905e-07  1.268e+00   0.000  1.00000   
SGP                    1.064e-06  2.691e+00   0.000  1.00000   
SGC                   -2.703e-09  3.289e-02   0.000  1.00000   
Depth                  1.480e-07  9.619e-01   0.000  1.00000   
SubstrateSand&Searass -8.516e-08  1.626e+00   0.000  1.00000 

Does this mean my data set is just to small to asses detection probability or am I doing something wrong?

According to Hair (author of book Multivariate Data Analysis), you need at least 15 examples for each feature (column) of your data. If you have 12, you could only select one feature.

So, run a t-test comparing means of features related the each one of the two classes (0 and 1 at target - dependent variable) and choose the feature (independent variable) whose mean difference between classes is the biggest. This means that variable can properly create a boundary to split these two classes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM