有序因子变量的标记

Question

我正在尝试使用gtsummary包生成单变量输出表。

structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered", 
"factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot", 
"wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"), 
lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L, 
62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 
9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L, 
10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L, 
25L, 15L)), row.names = c(NA, 10L), class = "data.frame")

...
library(gtsummary)
publication_dummytable1_sum %>% 
select(sex,age,lungfunction,ivdays) %>% 
tbl_uvregression(
method =lm,
y = lungfunction,
pvalue_fun = ~style_pvalue(.x, digits = 3)
) %>% 
add_global_p() %>%  # add global p-value 
bold_p() %>%        # bold p-values under a given threshold
bold_labels()
...

当我运行此代码时，我得到以下输出。 问题是有序因子变量（ age ）的标签。 R 为有序因子变量选择自己的标签。 是否可以告诉 R 不要为有序因子变量选择自己的标签？

我想要如下输出：

Answer 1

像许多其他人一样，我认为您可能误解了 R 中“有序”因子的含义。从某种意义上说，R 中的所有因子都是有序的； 估计等通常按levels向量的顺序打印、绘制等。 指定因子的类型为ordered有两个主要影响：

它允许您评估因子水平上的不等式（例如，您可以filter(age > "b") ）
对比默认设置为正交多项式对比，这是L （线性）和Q （二次）标签的来源：参见例如这个 CrossValidated 答案以获取更多详细信息。

如果您希望以与常规因素相同的方式处理此变量（以便对组与基线水平的差异进行估计，即处理对比），您可以：

转换回无序因子（例如factor(age, ordered=FALSE) ）
指定您要在模型中使用处理对比（在基础 R 中，您将指定contrasts = list(age = "contr.treatment") ）
set options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment")) ( ordered的默认值为 "contr.poly")

如果您有一个无序（“常规”）因子并且级别不是您想要的顺序，您可以通过明确指定级别来重置级别顺序，例如

mutate(across(age, factor, 
   levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))

R 默认按字母顺序设置因子，这有时不是你想要的（但我想不出顺序是“随机”的情况......）

Answer 2

删除有序变量的奇数标签的最简单方法是从这些因子变量中删除有序类。 下面举例！

library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.4.2'

publication_dummytable1_sum <- 
  structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L, 
                                              2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered", 
                                                                                                        "factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
                                                                                                                                      1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L, 
                                                                                                                                                                                                              1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot", 
                                                                                                                                                                                                                                                              "wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 
                                                                                                                                                                                                                                                                                                            2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"), 
                 lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L, 
                                  62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 
                                                        9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L, 
                                                                          10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L, 
                                                                                              25L, 15L)), row.names = c(NA, 10L), class = "data.frame") |>
  as_tibble()

# R labels the order factors like this in lm()
lm(lungfunction ~ age, publication_dummytable1_sum)
#> 
#> Call:
#> lm(formula = lungfunction ~ age, data = publication_dummytable1_sum)
#> 
#> Coefficients:
#> (Intercept)        age.L        age.Q  
#>       51.17       -10.37       -15.11


tbl <-
  publication_dummytable1_sum %>% 
  # remove ordered class
  mutate(across(where(is.ordered), ~factor(., ordered = FALSE))) %>%
  select(sex,age,lungfunction,ivdays) %>% 
  tbl_uvregression(
    method =lm,
    y = lungfunction,
    pvalue_fun = ~style_pvalue(.x, digits = 3)
  )

^{由reprex 包( v2.0.0 ) 于 2021 年 7 月 22 日创建}

有序因子变量的标记

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-07-22 19:36:14

解决方案2
2 2021-07-22 19:35:55

有序因子变量的标记

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-07-22 19:36:14

解决方案2 2 2021-07-22 19:35:55

解决方案1
3 已采纳 2021-07-22 19:36:14

解决方案2
2 2021-07-22 19:35:55