[英]How can I define variables as a comparator group including all variables except X?
I have a database with a column that has the different country of occurrence and another column with the type of event happened (3000+ entries) and is structured as exampled below:我有一个数据库,其中一列具有不同的发生国家/地区,另一列具有事件发生的类型(3000 多个条目),其结构如下所示:
#> country event
#> 1 USA Abdominal discomfort
#> 2 USA Abdominal discomfort
#> 3 Canada Vomiting
#> 4 UK Alopecia
#> 5 Hungary Agitation
#> 6 France Abscess
So, as you can imagine I have thousands of different variables!所以,正如您想象的那样,我有数千个不同的变量!
I want to compute the values for a contingency table like this:我想像这样计算列联表的值:
X event ![]() |
Not X event![]() |
|
---|---|---|
S![]() |
a![]() |
b ![]() |
C ![]() |
c ![]() |
d ![]() |
S = subset of interest (for example, United States) C = comparator (all other countries in the database) S = 感兴趣的子集(例如,美国) C = 比较对象(数据库中的所有其他国家/地区)
My main question is how can I create a code that allows me to create a new variable defining the group of all countries in the database except US or a new variable defining the group of all events that are not the "X event" , so I can get to the values of b and d?我的主要问题是如何创建一个代码,允许我创建一个新变量来定义数据库中除美国以外的所有国家/地区的组,或者一个新变量来定义不是“X 事件”的所有事件的组,所以我能得到b和d的值吗?
The final goal of this study is to see if there is a determined event that happens more frequently in a country than in the others这项研究的最终目标是看看是否有一个确定的事件在一个国家比其他国家发生得更频繁
Thanks and sorry if I am not expressing myself in the most correct way, I am completely new to RStudio and coding.谢谢,对不起,如果我没有以最正确的方式表达自己,我对 RStudio 和编码完全陌生。
It sounds like your data is in this format:听起来您的数据是这种格式:
df <- data.frame(country = c('USA', 'USA', 'Canada', 'UK', 'UK', 'France',
'Hungary', 'Hungary', 'France', 'Canada'),
event = c('Vomiting', 'Agitation', 'Headache', 'Headache',
'Abdominal pain', 'Vomiting', 'Agitation',
'Abdominal pain', 'Anaphylaxis', 'Vomiting'))
df
#> country event
#> 1 USA Vomiting
#> 2 USA Agitation
#> 3 Canada Headache
#> 4 UK Headache
#> 5 UK Abdominal pain
#> 6 France Vomiting
#> 7 Hungary Agitation
#> 8 Hungary Abdominal pain
#> 9 France Anaphylaxis
#> 10 Canada Vomiting
That being the case, you would require one table for each condition in each country, which may result in dozens or hundreds of different tables to process (for example, there would be 25 tables even for this little toy example).在这种情况下,每个国家/地区的每个条件都需要一个表,这可能会导致处理数十个或数百个不同的表(例如,即使是这个小玩具示例也会有 25 个表)。
To help summarize your data, you could directly get the odds ratio and p value for each condition in each country by creating each table inside an lapply
call and doing a Fisher test to extract the statistics out of it.为了帮助总结您的数据,您可以通过在
lapply
调用中创建每个表并进行 Fisher 检验以从中提取统计数据,从而直接获得每个国家/地区每个条件的优势比和 p 值。 Then you can turn the whole thing into a data frame:然后你可以把整个东西变成一个数据框:
`row.names<-`(do.call(rbind, lapply(sort(unique(df$event)), function(x) {
do.call(rbind, lapply(sort(unique(df$country)), function(y) {
tab <- table(df$country == y, df$event == x)
ft <- fisher.test(tab)
data.frame(country = y, condition = x, event = tab[2, 2], no_event = tab[2, 1],
ROW_event = tab[1, 2], ROW_no_event = tab[1, 1],
odds_ratio = ft$estimate,
p_value = ft$p.value)
}))
})), NULL)
#> country condition event no_event ROW_event ROW_no_event odds_ratio
#> 1 Canada Abdominal pain 0 2 2 6 0.000000
#> 2 France Abdominal pain 0 2 2 6 0.000000
#> 3 Hungary Abdominal pain 1 1 1 7 5.291552
#> 4 UK Abdominal pain 1 1 1 7 5.291552
#> 5 USA Abdominal pain 0 2 2 6 0.000000
#> 6 Canada Agitation 0 2 2 6 0.000000
#> 7 France Agitation 0 2 2 6 0.000000
#> 8 Hungary Agitation 1 1 1 7 5.291552
#> 9 UK Agitation 0 2 2 6 0.000000
#> 10 USA Agitation 1 1 1 7 5.291552
#> 11 Canada Anaphylaxis 0 2 1 7 0.000000
#> 12 France Anaphylaxis 1 1 0 8 Inf
#> 13 Hungary Anaphylaxis 0 2 1 7 0.000000
#> 14 UK Anaphylaxis 0 2 1 7 0.000000
#> 15 USA Anaphylaxis 0 2 1 7 0.000000
#> 16 Canada Headache 1 1 1 7 5.291552
#> 17 France Headache 0 2 2 6 0.000000
#> 18 Hungary Headache 0 2 2 6 0.000000
#> 19 UK Headache 1 1 1 7 5.291552
#> 20 USA Headache 0 2 2 6 0.000000
#> 21 Canada Vomiting 1 1 2 6 2.645752
#> 22 France Vomiting 1 1 2 6 2.645752
#> 23 Hungary Vomiting 0 2 3 5 0.000000
#> 24 UK Vomiting 0 2 3 5 0.000000
#> 25 USA Vomiting 1 1 2 6 2.645752
#> p_value
#> 1 1.0000000
#> 2 1.0000000
#> 3 0.3777778
#> 4 0.3777778
#> 5 1.0000000
#> 6 1.0000000
#> 7 1.0000000
#> 8 0.3777778
#> 9 1.0000000
#> 10 0.3777778
#> 11 1.0000000
#> 12 0.2000000
#> 13 1.0000000
#> 14 1.0000000
#> 15 1.0000000
#> 16 0.3777778
#> 17 1.0000000
#> 18 1.0000000
#> 19 0.3777778
#> 20 1.0000000
#> 21 1.0000000
#> 22 1.0000000
#> 23 1.0000000
#> 24 1.0000000
#> 25 1.0000000
Created on 2022-04-14 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-04-14
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.