I have a database with a column that has the different country of occurrence and another column with the type of event happened (3000+ entries) and is structured as exampled below:
#> country event
#> 1 USA Abdominal discomfort
#> 2 USA Abdominal discomfort
#> 3 Canada Vomiting
#> 4 UK Alopecia
#> 5 Hungary Agitation
#> 6 France Abscess
So, as you can imagine I have thousands of different variables!
I want to compute the values for a contingency table like this:
X event | Not X event | |
---|---|---|
S | a | b |
C | c | d |
S = subset of interest (for example, United States) C = comparator (all other countries in the database)
My main question is how can I create a code that allows me to create a new variable defining the group of all countries in the database except US or a new variable defining the group of all events that are not the "X event" , so I can get to the values of b and d?
The final goal of this study is to see if there is a determined event that happens more frequently in a country than in the others
Thanks and sorry if I am not expressing myself in the most correct way, I am completely new to RStudio and coding.
It sounds like your data is in this format:
df <- data.frame(country = c('USA', 'USA', 'Canada', 'UK', 'UK', 'France',
'Hungary', 'Hungary', 'France', 'Canada'),
event = c('Vomiting', 'Agitation', 'Headache', 'Headache',
'Abdominal pain', 'Vomiting', 'Agitation',
'Abdominal pain', 'Anaphylaxis', 'Vomiting'))
df
#> country event
#> 1 USA Vomiting
#> 2 USA Agitation
#> 3 Canada Headache
#> 4 UK Headache
#> 5 UK Abdominal pain
#> 6 France Vomiting
#> 7 Hungary Agitation
#> 8 Hungary Abdominal pain
#> 9 France Anaphylaxis
#> 10 Canada Vomiting
That being the case, you would require one table for each condition in each country, which may result in dozens or hundreds of different tables to process (for example, there would be 25 tables even for this little toy example).
To help summarize your data, you could directly get the odds ratio and p value for each condition in each country by creating each table inside an lapply
call and doing a Fisher test to extract the statistics out of it. Then you can turn the whole thing into a data frame:
`row.names<-`(do.call(rbind, lapply(sort(unique(df$event)), function(x) {
do.call(rbind, lapply(sort(unique(df$country)), function(y) {
tab <- table(df$country == y, df$event == x)
ft <- fisher.test(tab)
data.frame(country = y, condition = x, event = tab[2, 2], no_event = tab[2, 1],
ROW_event = tab[1, 2], ROW_no_event = tab[1, 1],
odds_ratio = ft$estimate,
p_value = ft$p.value)
}))
})), NULL)
#> country condition event no_event ROW_event ROW_no_event odds_ratio
#> 1 Canada Abdominal pain 0 2 2 6 0.000000
#> 2 France Abdominal pain 0 2 2 6 0.000000
#> 3 Hungary Abdominal pain 1 1 1 7 5.291552
#> 4 UK Abdominal pain 1 1 1 7 5.291552
#> 5 USA Abdominal pain 0 2 2 6 0.000000
#> 6 Canada Agitation 0 2 2 6 0.000000
#> 7 France Agitation 0 2 2 6 0.000000
#> 8 Hungary Agitation 1 1 1 7 5.291552
#> 9 UK Agitation 0 2 2 6 0.000000
#> 10 USA Agitation 1 1 1 7 5.291552
#> 11 Canada Anaphylaxis 0 2 1 7 0.000000
#> 12 France Anaphylaxis 1 1 0 8 Inf
#> 13 Hungary Anaphylaxis 0 2 1 7 0.000000
#> 14 UK Anaphylaxis 0 2 1 7 0.000000
#> 15 USA Anaphylaxis 0 2 1 7 0.000000
#> 16 Canada Headache 1 1 1 7 5.291552
#> 17 France Headache 0 2 2 6 0.000000
#> 18 Hungary Headache 0 2 2 6 0.000000
#> 19 UK Headache 1 1 1 7 5.291552
#> 20 USA Headache 0 2 2 6 0.000000
#> 21 Canada Vomiting 1 1 2 6 2.645752
#> 22 France Vomiting 1 1 2 6 2.645752
#> 23 Hungary Vomiting 0 2 3 5 0.000000
#> 24 UK Vomiting 0 2 3 5 0.000000
#> 25 USA Vomiting 1 1 2 6 2.645752
#> p_value
#> 1 1.0000000
#> 2 1.0000000
#> 3 0.3777778
#> 4 0.3777778
#> 5 1.0000000
#> 6 1.0000000
#> 7 1.0000000
#> 8 0.3777778
#> 9 1.0000000
#> 10 0.3777778
#> 11 1.0000000
#> 12 0.2000000
#> 13 1.0000000
#> 14 1.0000000
#> 15 1.0000000
#> 16 0.3777778
#> 17 1.0000000
#> 18 1.0000000
#> 19 0.3777778
#> 20 1.0000000
#> 21 1.0000000
#> 22 1.0000000
#> 23 1.0000000
#> 24 1.0000000
#> 25 1.0000000
Created on 2022-04-14 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.