简体   繁体   English

R - cox风险模型不包括因子水平

[英]R - cox hazard model not including levels of a factor

I am fitting a cox model to some data that is structured as such: 我将cox模型拟合到一些结构如下的数据:

str(test)
'data.frame':   147 obs. of  8 variables:
 $ AGE              : int  71 69 90 78 61 74 78 78 81 45 ...
 $ Gender           : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
 $ RACE             : Factor w/ 5 levels "","BLACK","HISPANIC",..: 5 2 5 5 5 5 5 5 5 1 ...
 $ SIDE             : Factor w/ 2 levels "L","R": 1 1 2 1 2 1 1 1 2 1 ...
 $ LESION.INDICATION: Factor w/ 12 levels "CLAUDICATION",..: 1 11 4 11 9 1 1 11 11 11 ...
 $ RUTH.CLASS       : int  3 5 4 5 4 3 3 5 5 5 ...
 $ LESION.TYPE      : Factor w/ 3 levels "","OCCLUSION",..: 3 3 2 3 3 3 2 3 3 3 ...
 $ Primary          : int  1190 1032 166 689 219 840 1063 115 810 157 ...

the RUTH.CLASS variable is actually a factor, and i've changed it to one as such: RUTH.CLASS变量实际上是一个因素,我已经将其更改为一个:

> test$RUTH.CLASS <- as.factor(test$RUTH.CLASS)
> summary(test$RUTH.CLASS)
 3  4  5  6 
48 56 35  8 

great. 大。

after fitting the model 在拟合模型之后

stent.surv <- Surv(test$Primary)
> cox.ruthclass <- coxph(stent.surv ~ RUTH.CLASS, data=test )
> 
> summary(cox.ruthclass)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS, data = test)

  n= 147, number of events= 147 

              coef exp(coef) se(coef)     z Pr(>|z|)   
RUTH.CLASS4 0.1599    1.1734   0.1987 0.804  0.42111   
RUTH.CLASS5 0.5848    1.7947   0.2263 2.585  0.00974 **
RUTH.CLASS6 0.3624    1.4368   0.3846 0.942  0.34599   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4     1.173     0.8522    0.7948     1.732
RUTH.CLASS5     1.795     0.5572    1.1518     2.796
RUTH.CLASS6     1.437     0.6960    0.6762     3.053

Concordance= 0.574  (se = 0.026 )
Rsquare= 0.045   (max possible= 1 )
Likelihood ratio test= 6.71  on 3 df,   p=0.08156
Wald test            = 7.09  on 3 df,   p=0.06902
Score (logrank) test = 7.23  on 3 df,   p=0.06478

> levels(test$RUTH.CLASS)
[1] "3" "4" "5" "6"

When i fit more variables in the model, similar things happen: 当我在模型中拟合更多变量时,会发生类似的事情:

cox.fit <- coxph(stent.surv ~ RUTH.CLASS + LESION.INDICATION + LESION.TYPE, data=test )
> 
> summary(cox.fit)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS + LESION.INDICATION + 
    LESION.TYPE, data = test)

  n= 147, number of events= 147 

                                          coef exp(coef) se(coef)      z Pr(>|z|)  
RUTH.CLASS4                            -0.5854    0.5569   1.1852 -0.494   0.6214  
RUTH.CLASS5                            -0.1476    0.8627   1.0182 -0.145   0.8847  
RUTH.CLASS6                            -0.4509    0.6370   1.0998 -0.410   0.6818  
LESION.INDICATIONEMBOLIC               -0.4611    0.6306   1.5425 -0.299   0.7650  
LESION.INDICATIONISCHEMIA               1.3794    3.9725   1.1541  1.195   0.2320  
LESION.INDICATIONISCHEMIA/CLAUDICATION  0.2546    1.2899   1.0189  0.250   0.8027  
LESION.INDICATIONREST PAIN              0.5302    1.6993   1.1853  0.447   0.6547  
LESION.INDICATIONTISSUE LOSS            0.7793    2.1800   1.0254  0.760   0.4473  
LESION.TYPEOCCLUSION                   -0.5886    0.5551   0.4360 -1.350   0.1770  
LESION.TYPESTEN                        -0.7895    0.4541   0.4378 -1.803   0.0714 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                                       exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4                               0.5569     1.7956   0.05456     5.684
RUTH.CLASS5                               0.8627     1.1591   0.11726     6.348
RUTH.CLASS6                               0.6370     1.5698   0.07379     5.499
LESION.INDICATIONEMBOLIC                  0.6306     1.5858   0.03067    12.964
LESION.INDICATIONISCHEMIA                 3.9725     0.2517   0.41374    38.141
LESION.INDICATIONISCHEMIA/CLAUDICATION    1.2899     0.7752   0.17510     9.503
LESION.INDICATIONREST PAIN                1.6993     0.5885   0.16645    17.347
LESION.INDICATIONTISSUE LOSS              2.1800     0.4587   0.29216    16.266
LESION.TYPEOCCLUSION                      0.5551     1.8015   0.23619     1.305
LESION.TYPESTEN                           0.4541     2.2023   0.19250     1.071

Concordance= 0.619  (se = 0.028 )
Rsquare= 0.137   (max possible= 1 )
Likelihood ratio test= 21.6  on 10 df,   p=0.01726
Wald test            = 22.23  on 10 df,   p=0.01398
Score (logrank) test = 23.46  on 10 df,   p=0.009161

> levels(test$LESION.INDICATION)
[1] "CLAUDICATION"          "EMBOLIC"               "ISCHEMIA"              "ISCHEMIA/CLAUDICATION"
[5] "REST PAIN"             "TISSUE LOSS"          
> levels(test$LESION.TYPE)
[1] ""          "OCCLUSION" "STEN" 

truncated output from model.matrix below: 从下面的model.matrix截断输出:

> model.matrix(cox.fit)
    RUTH.CLASS4 RUTH.CLASS5 RUTH.CLASS6 LESION.INDICATIONEMBOLIC LESION.INDICATIONISCHEMIA
1             0           0           0                        0                         0
2             0           1           0                        0                         0

We can see that the the first level of each of these is being excluded from the model. 我们可以看到,这些中的每一个的第一级都被排除在模型之外。 Any input would be greatly appreciated. 任何投入将不胜感激。 I noticed that on the LESION.TYPE variable, the blank level "" is not being included, but that is not by design - that should be a NA or something similar. 我注意到在LESION.TYPE变量上,没有包含空白级别"" ,但这不是设计 - 应该是NA或类似的东西。

I'm confused and could use some help with this. 我很困惑,可以用一些帮助。 Thanks. 谢谢。

Factors in any model return coefficients based on a base level (a contrast).Your contrasts default to a base factor. 任何模型中的因子都会根据基准水平(对比度)返回系数。您的contrasts默认为基本因子。 There is no point in calculating a coefficient for the dropped value because the model will return the predictions when that dropped value = 1 given that all the other factor values are 0 (factors are complete and mutually exclusive for every observation). 计算下降值的系数没有意义,因为假设所有其他因子值为0(因子是完整的并且对于每个观察是互斥的),模型将在下降值= 1时返回预测。 You can alter your default contrast by changing the contrasts in your options . 您可以通过更改optionscontrasts来更改默认对比度。

For your coefficients to be versus an average of all factors: 对于您的系数与所有因子的平均值:

options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))

For your coefficients to be versus a specific treatment (what you have above and your default): 对于你的系数与特定治疗(你上面的和你的默认):

options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))

As you can see there are two types of factors in R: unordered (or categorical, eg red, green, blue) and ordered (eg strongly disagree, disagree, no opinion, agree, strongly agree) 正如您所看到的,R中有两种类型的因素:无序(或分类,例如红色,绿色,蓝色)和有序(例如,非常不同意,不同意,没有意见,同意,非常同意)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM