如何将变量定义为包含除 X 以外的所有变量的比较器组？

Question

I have a database with a column that has the different country of occurrence and another column with the type of event happened (3000+ entries) and is structured as exampled below:我有一个数据库，其中一列具有不同的发生国家/地区，另一列具有事件发生的类型（3000 多个条目），其结构如下所示：

#>   country           event
#> 1     USA           Abdominal discomfort
#> 2     USA           Abdominal discomfort
#> 3  Canada           Vomiting
#> 4      UK           Alopecia
#> 5  Hungary          Agitation     
#> 6  France           Abscess

So, as you can imagine I have thousands of different variables!所以，正如您想象的那样，我有数千个不同的变量！

I want to compute the values for a contingency table like this:我想像这样计算列联表的值：

	X event X事件	Not X event不是 X 事件
S小号	a一种	b b
C C	c c	d d

S = subset of interest (for example, United States) C = comparator (all other countries in the database) S = 感兴趣的子集（例如，美国） C = 比较对象（数据库中的所有其他国家/地区）

My main question is how can I create a code that allows me to create a new variable defining the group of all countries in the database except US or a new variable defining the group of all events that are not the "X event" , so I can get to the values of b and d?我的主要问题是如何创建一个代码，允许我创建一个新变量来定义数据库中除美国以外的所有国家/地区的组，或者一个新变量来定义不是“X 事件”的所有事件的组，所以我能得到b和d的值吗？

The final goal of this study is to see if there is a determined event that happens more frequently in a country than in the others这项研究的最终目标是看看是否有一个确定的事件在一个国家比其他国家发生得更频繁

Thanks and sorry if I am not expressing myself in the most correct way, I am completely new to RStudio and coding.谢谢，对不起，如果我没有以最正确的方式表达自己，我对 RStudio 和编码完全陌生。

Answer 1

It sounds like your data is in this format:听起来您的数据是这种格式：

df <- data.frame(country = c('USA', 'USA', 'Canada', 'UK', 'UK', 'France',
                             'Hungary', 'Hungary', 'France', 'Canada'),
                 event = c('Vomiting', 'Agitation', 'Headache', 'Headache',
                           'Abdominal pain', 'Vomiting', 'Agitation', 
                           'Abdominal pain', 'Anaphylaxis', 'Vomiting'))
df
#>    country          event
#> 1      USA       Vomiting
#> 2      USA      Agitation
#> 3   Canada       Headache
#> 4       UK       Headache
#> 5       UK Abdominal pain
#> 6   France       Vomiting
#> 7  Hungary      Agitation
#> 8  Hungary Abdominal pain
#> 9   France    Anaphylaxis
#> 10  Canada       Vomiting

That being the case, you would require one table for each condition in each country, which may result in dozens or hundreds of different tables to process (for example, there would be 25 tables even for this little toy example).在这种情况下，每个国家/地区的每个条件都需要一个表，这可能会导致处理数十个或数百个不同的表（例如，即使是这个小玩具示例也会有 25 个表）。

To help summarize your data, you could directly get the odds ratio and p value for each condition in each country by creating each table inside an lapply call and doing a Fisher test to extract the statistics out of it.为了帮助总结您的数据，您可以通过在lapply调用中创建每个表并进行 Fisher 检验以从中提取统计数据，从而直接获得每个国家/地区每个条件的优势比和 p 值。 Then you can turn the whole thing into a data frame:然后你可以把整个东西变成一个数据框：

`row.names<-`(do.call(rbind, lapply(sort(unique(df$event)), function(x) {
  do.call(rbind, lapply(sort(unique(df$country)), function(y) {
  tab <- table(df$country == y, df$event == x)
  ft <- fisher.test(tab)
  data.frame(country = y, condition = x, event = tab[2, 2], no_event = tab[2, 1],
             ROW_event = tab[1, 2], ROW_no_event = tab[1, 1],
             odds_ratio = ft$estimate,
             p_value = ft$p.value)
  }))
})), NULL)
#>    country      condition event no_event ROW_event ROW_no_event odds_ratio
#> 1   Canada Abdominal pain     0        2         2            6   0.000000
#> 2   France Abdominal pain     0        2         2            6   0.000000
#> 3  Hungary Abdominal pain     1        1         1            7   5.291552
#> 4       UK Abdominal pain     1        1         1            7   5.291552
#> 5      USA Abdominal pain     0        2         2            6   0.000000
#> 6   Canada      Agitation     0        2         2            6   0.000000
#> 7   France      Agitation     0        2         2            6   0.000000
#> 8  Hungary      Agitation     1        1         1            7   5.291552
#> 9       UK      Agitation     0        2         2            6   0.000000
#> 10     USA      Agitation     1        1         1            7   5.291552
#> 11  Canada    Anaphylaxis     0        2         1            7   0.000000
#> 12  France    Anaphylaxis     1        1         0            8        Inf
#> 13 Hungary    Anaphylaxis     0        2         1            7   0.000000
#> 14      UK    Anaphylaxis     0        2         1            7   0.000000
#> 15     USA    Anaphylaxis     0        2         1            7   0.000000
#> 16  Canada       Headache     1        1         1            7   5.291552
#> 17  France       Headache     0        2         2            6   0.000000
#> 18 Hungary       Headache     0        2         2            6   0.000000
#> 19      UK       Headache     1        1         1            7   5.291552
#> 20     USA       Headache     0        2         2            6   0.000000
#> 21  Canada       Vomiting     1        1         2            6   2.645752
#> 22  France       Vomiting     1        1         2            6   2.645752
#> 23 Hungary       Vomiting     0        2         3            5   0.000000
#> 24      UK       Vomiting     0        2         3            5   0.000000
#> 25     USA       Vomiting     1        1         2            6   2.645752
#>      p_value
#> 1  1.0000000
#> 2  1.0000000
#> 3  0.3777778
#> 4  0.3777778
#> 5  1.0000000
#> 6  1.0000000
#> 7  1.0000000
#> 8  0.3777778
#> 9  1.0000000
#> 10 0.3777778
#> 11 1.0000000
#> 12 0.2000000
#> 13 1.0000000
#> 14 1.0000000
#> 15 1.0000000
#> 16 0.3777778
#> 17 1.0000000
#> 18 1.0000000
#> 19 0.3777778
#> 20 1.0000000
#> 21 1.0000000
#> 22 1.0000000
#> 23 1.0000000
#> 24 1.0000000
#> 25 1.0000000

^{Created on 2022-04-14 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-04-14}

如何将变量定义为包含除 X 以外的所有变量的比较器组？

问题描述

1 个解决方案

解决方案1
1 2022-04-14 10:59:29

如何将变量定义为包含除 X 以外的所有变量的比较器组？

问题描述

1 个解决方案

解决方案1 1 2022-04-14 10:59:29

解决方案1
1 2022-04-14 10:59:29