简体   繁体   English

哪种是测试非数字数据之间显着差异的正确方法? 哪个是正确的事后?

[英]Which is the correct way to test for significant differences between non-numeric data? Which is the correct post-hoc?

I'm working with non numeric data that looks something like this:我正在处理看起来像这样的非数字数据:

Origin起源 ESBL ESBL
Hospital医院 ESBL ESBL
Hospital医院 Non-ESBL非ESBL
Hospital医院 ESBL ESBL
City城市 ESBL ESBL
Hospital医院 Non-ESBL非ESBL
City城市 ESBL ESBL
Country国家 ESBL ESBL
Hospital医院 ESBL ESBL

And I want to compare if there is a statistical association between the origin and the variable ESBL.我想比较原点和变量 ESBL 之间是否存在统计关联。

So far I have tried generating a contingency table in R using:到目前为止,我已经尝试使用以下方法在 R 中生成列联表:

cont_tab<-table(data$Origin, data$ESBL)

and the running a chi squared test for independence:并运行卡方独立性检验:

chi_test<-chisq.test(cont_tab)

After this, I get that there is indeed independency:在此之后,我知道确实存在独立性:

X-squared = 17.306, df = 2, p-value = 0.0001746

But now I want to know which are the combinations that are responsible for this values (ESBL-Hospital, Non-ESBL-Hospital, ESBL-City and so on).但现在我想知道哪些组合负责此值(ESBL-Hospital、Non-ESBL-Hospital、ESBL-City 等)。

I have tried running multiple Fisher tests:我试过运行多个 Fisher 测试:

Library(RVAideMemoire)
multifish<-fisher.multcomp(cont_tab)

But I don't really get what I want:但我并没有真正得到我想要的:

            ESBL Non-ESBL
  Hospital   46      122
  City       27       21
  Country    56       69

Am I doing anything wrong?我做错了什么吗? Is there a better approach for this?有更好的方法吗?

Thanks!!!谢谢!!!

I think the "final result" you are showing is actually cont_tab .我认为您显示的“最终结果”实际上是cont_tab When I run your code, cont_tab looks like the result you are showing as being the output from fisher.multicomp :当我运行您的代码时, cont_tab看起来像您显示的结果是fisher.multicomp的输出:

cont_tab <- table(data$Origin, data$ESBL)

cont_tab
#>           
#>            ESBL Non-ESBL
#>   Hospital   46      122
#>   City       27       21
#>   Country    56       69

Whereas, if I run fisher.multcomp on cont_tab , I get:然而,如果我在fisher.multcomp上运行cont_tab ,我会得到:

library(RVAideMemoire)

fisher.multcomp(cont_tab)
#> 
#>         Pairwise comparisons using Fisher's exact test for count data
#> 
#> data:  cont_tab
#> 
#>         Hospital  City
#> City    0.001313     -
#> Country 0.004249 0.234
#> 
#> P value adjustment method: fdr

We can see in it (as expected) that Hospital is significantly different from both City and Country , but there is no significant difference between City and Country .我们可以在其中看到(正如预期的那样) HospitalCity Country City Country显着差异。

Created on 2022-12-13 with reprex v2.0.2创建于 2022-12-13,使用reprex v2.0.2


Data inferred from question从问题中推断出的数据

data <- data.frame(
  ESBL = factor(c(rep(c("ESBL", "Non-ESBL"), times = c(46, 122)),
                  rep(c("ESBL", "Non-ESBL"), times = c(27, 21)),
                  rep(c("ESBL", 'Non-ESBL'), times = c(56, 69)))),
  Origin = factor(rep(c('Hospital', 'City', 'Country'), 
                      times =  c(168, 48, 125)), 
                  c('Hospital', 'City', 'Country')))

Take a look at https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html for a completely different, academically pristine approach implemented in the R package ade4.查看https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html ,了解在 R package ade4 中实施的完全不同的学术原始方法。 Your variable ESBL would play the role of the class variable in a Discriminant Correspondence Analysis (DCA).您的变量 ESBL 将在判别对应分析 (DCA) 中扮演 class 变量的角色。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM