哪种是测试非数字数据之间显着差异的正确方法？哪个是正确的事后？

Question

I'm working with non numeric data that looks something like this:我正在处理看起来像这样的非数字数据：

Origin起源	ESBL ESBL
Hospital医院	ESBL ESBL
Hospital医院	Non-ESBL非ESBL
Hospital医院	ESBL ESBL
City城市	ESBL ESBL
Hospital医院	Non-ESBL非ESBL
City城市	ESBL ESBL
Country国家	ESBL ESBL
Hospital医院	ESBL ESBL

And I want to compare if there is a statistical association between the origin and the variable ESBL.我想比较原点和变量 ESBL 之间是否存在统计关联。

So far I have tried generating a contingency table in R using:到目前为止，我已经尝试使用以下方法在 R 中生成列联表：

cont_tab<-table(data$Origin, data$ESBL)

and the running a chi squared test for independence:并运行卡方独立性检验：

chi_test<-chisq.test(cont_tab)

After this, I get that there is indeed independency:在此之后，我知道确实存在独立性：

X-squared = 17.306, df = 2, p-value = 0.0001746

But now I want to know which are the combinations that are responsible for this values (ESBL-Hospital, Non-ESBL-Hospital, ESBL-City and so on).但现在我想知道哪些组合负责此值（ESBL-Hospital、Non-ESBL-Hospital、ESBL-City 等）。

I have tried running multiple Fisher tests:我试过运行多个 Fisher 测试：

Library(RVAideMemoire)
multifish<-fisher.multcomp(cont_tab)

But I don't really get what I want:但我并没有真正得到我想要的：

            ESBL Non-ESBL
  Hospital   46      122
  City       27       21
  Country    56       69

Am I doing anything wrong?我做错了什么吗？ Is there a better approach for this?有更好的方法吗？

Thanks!!!谢谢！！！

Answer 1

I think the "final result" you are showing is actually cont_tab .我认为您显示的“最终结果”实际上是cont_tab 。 When I run your code, cont_tab looks like the result you are showing as being the output from fisher.multicomp :当我运行您的代码时， cont_tab看起来像您显示的结果是fisher.multicomp的输出：

cont_tab <- table(data$Origin, data$ESBL)

cont_tab
#>           
#>            ESBL Non-ESBL
#>   Hospital   46      122
#>   City       27       21
#>   Country    56       69

Whereas, if I run fisher.multcomp on cont_tab , I get:然而，如果我在fisher.multcomp上运行cont_tab ，我会得到：

library(RVAideMemoire)

fisher.multcomp(cont_tab)
#> 
#>         Pairwise comparisons using Fisher's exact test for count data
#> 
#> data:  cont_tab
#> 
#>         Hospital  City
#> City    0.001313     -
#> Country 0.004249 0.234
#> 
#> P value adjustment method: fdr

We can see in it (as expected) that Hospital is significantly different from both City and Country , but there is no significant difference between City and Country .我们可以在其中看到（正如预期的那样） Hospital与City Country City Country显着差异。

^{Created on 2022-12-13 with reprex v2.0.2}^{创建于 2022-12-13，使用reprex v2.0.2}

Data inferred from question从问题中推断出的数据

data <- data.frame(
  ESBL = factor(c(rep(c("ESBL", "Non-ESBL"), times = c(46, 122)),
                  rep(c("ESBL", "Non-ESBL"), times = c(27, 21)),
                  rep(c("ESBL", 'Non-ESBL'), times = c(56, 69)))),
  Origin = factor(rep(c('Hospital', 'City', 'Country'), 
                      times =  c(168, 48, 125)), 
                  c('Hospital', 'City', 'Country')))

Answer 2

Take a look at https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html for a completely different, academically pristine approach implemented in the R package ade4.查看https://pbil.univ-lyon1.fr/ADE-4/ade4-html/discrimin.coa.html ，了解在 R package ade4 中实施的完全不同的学术原始方法。 Your variable ESBL would play the role of the class variable in a Discriminant Correspondence Analysis (DCA).您的变量 ESBL 将在判别对应分析 (DCA) 中扮演 class 变量的角色。

哪种是测试非数字数据之间显着差异的正确方法？哪个是正确的事后？

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-12-13 14:52:26

解决方案2
0 2023-01-04 12:33:56

哪种是测试非数字数据之间显着差异的正确方法？ 哪个是正确的事后？

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-12-13 14:52:26

解决方案2 0 2023-01-04 12:33:56

哪种是测试非数字数据之间显着差异的正确方法？哪个是正确的事后？

解决方案1
0 已采纳 2022-12-13 14:52:26

解决方案2
0 2023-01-04 12:33:56