[英]Which is the correct way to test for significant differences between non-numeric data? Which is the correct post-hoc?
I'm working with non numeric data that looks something like this:我正在处理看起来像这样的非数字数据:
Origin起源 | ESBL ESBL |
---|---|
Hospital医院 | ESBL ESBL |
Hospital医院 | Non-ESBL非ESBL |
Hospital医院 | ESBL ESBL |
City城市 | ESBL ESBL |
Hospital医院 | Non-ESBL非ESBL |
City城市 | ESBL ESBL |
Country国家 | ESBL ESBL |
Hospital医院 | ESBL ESBL |
And I want to compare if there is a statistical association between the origin and the variable ESBL.我想比较原点和变量 ESBL 之间是否存在统计关联。
So far I have tried generating a contingency table in R using:到目前为止,我已经尝试使用以下方法在 R 中生成列联表:
cont_tab<-table(data$Origin, data$ESBL)
and the running a chi squared test for independence:并运行卡方独立性检验:
chi_test<-chisq.test(cont_tab)
After this, I get that there is indeed independency:在此之后,我知道确实存在独立性:
X-squared = 17.306, df = 2, p-value = 0.0001746
But now I want to know which are the combinations that are responsible for this values (ESBL-Hospital, Non-ESBL-Hospital, ESBL-City and so on).但现在我想知道哪些组合负责此值(ESBL-Hospital、Non-ESBL-Hospital、ESBL-City 等)。
I have tried running multiple Fisher tests:我试过运行多个 Fisher 测试:
Library(RVAideMemoire)
multifish<-fisher.multcomp(cont_tab)
But I don't really get what I want:但我并没有真正得到我想要的:
ESBL Non-ESBL
Hospital 46 122
City 27 21
Country 56 69
Am I doing anything wrong?我做错了什么吗? Is there a better approach for this?有更好的方法吗?
Thanks!!!谢谢!!!
I think the "final result" you are showing is actually cont_tab
.我认为您显示的“最终结果”实际上是cont_tab
。 When I run your code, cont_tab
looks like the result you are showing as being the output from fisher.multicomp
:当我运行您的代码时, cont_tab
看起来像您显示的结果是fisher.multicomp
的输出:
cont_tab <- table(data$Origin, data$ESBL)
cont_tab
#>
#> ESBL Non-ESBL
#> Hospital 46 122
#> City 27 21
#> Country 56 69
Whereas, if I run fisher.multcomp
on cont_tab
, I get:然而,如果我在fisher.multcomp
上运行cont_tab
,我会得到:
library(RVAideMemoire)
fisher.multcomp(cont_tab)
#>
#> Pairwise comparisons using Fisher's exact test for count data
#>
#> data: cont_tab
#>
#> Hospital City
#> City 0.001313 -
#> Country 0.004249 0.234
#>
#> P value adjustment method: fdr
We can see in it (as expected) that Hospital
is significantly different from both City
and Country
, but there is no significant difference between City
and Country
.我们可以在其中看到(正如预期的那样) Hospital
与City
Country
City
Country
显着差异。
Created on 2022-12-13 with reprex v2.0.2创建于 2022-12-13,使用reprex v2.0.2
Data inferred from question从问题中推断出的数据
data <- data.frame(
ESBL = factor(c(rep(c("ESBL", "Non-ESBL"), times = c(46, 122)),
rep(c("ESBL", "Non-ESBL"), times = c(27, 21)),
rep(c("ESBL", 'Non-ESBL'), times = c(56, 69)))),
Origin = factor(rep(c('Hospital', 'City', 'Country'),
times = c(168, 48, 125)),
c('Hospital', 'City', 'Country')))
Take a look at https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html for a completely different, academically pristine approach implemented in the R package ade4.查看https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html ,了解在 R package ade4 中实施的完全不同的学术原始方法。 Your variable ESBL would play the role of the class variable in a Discriminant Correspondence Analysis (DCA).您的变量 ESBL 将在判别对应分析 (DCA) 中扮演 class 变量的角色。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.