How can I define variables as a comparator group including all variables except X?

Question

I have a database with a column that has the different country of occurrence and another column with the type of event happened (3000+ entries) and is structured as exampled below:

#>   country           event
#> 1     USA           Abdominal discomfort
#> 2     USA           Abdominal discomfort
#> 3  Canada           Vomiting
#> 4      UK           Alopecia
#> 5  Hungary          Agitation     
#> 6  France           Abscess

So, as you can imagine I have thousands of different variables!

I want to compute the values for a contingency table like this:

	X event	Not X event
S	a	b
C	c	d

S = subset of interest (for example, United States) C = comparator (all other countries in the database)

My main question is how can I create a code that allows me to create a new variable defining the group of all countries in the database except US or a new variable defining the group of all events that are not the "X event" , so I can get to the values of b and d?

The final goal of this study is to see if there is a determined event that happens more frequently in a country than in the others

Thanks and sorry if I am not expressing myself in the most correct way, I am completely new to RStudio and coding.

Answer 1

It sounds like your data is in this format:

df <- data.frame(country = c('USA', 'USA', 'Canada', 'UK', 'UK', 'France',
                             'Hungary', 'Hungary', 'France', 'Canada'),
                 event = c('Vomiting', 'Agitation', 'Headache', 'Headache',
                           'Abdominal pain', 'Vomiting', 'Agitation', 
                           'Abdominal pain', 'Anaphylaxis', 'Vomiting'))
df
#>    country          event
#> 1      USA       Vomiting
#> 2      USA      Agitation
#> 3   Canada       Headache
#> 4       UK       Headache
#> 5       UK Abdominal pain
#> 6   France       Vomiting
#> 7  Hungary      Agitation
#> 8  Hungary Abdominal pain
#> 9   France    Anaphylaxis
#> 10  Canada       Vomiting

That being the case, you would require one table for each condition in each country, which may result in dozens or hundreds of different tables to process (for example, there would be 25 tables even for this little toy example).

To help summarize your data, you could directly get the odds ratio and p value for each condition in each country by creating each table inside an lapply call and doing a Fisher test to extract the statistics out of it. Then you can turn the whole thing into a data frame:

`row.names<-`(do.call(rbind, lapply(sort(unique(df$event)), function(x) {
  do.call(rbind, lapply(sort(unique(df$country)), function(y) {
  tab <- table(df$country == y, df$event == x)
  ft <- fisher.test(tab)
  data.frame(country = y, condition = x, event = tab[2, 2], no_event = tab[2, 1],
             ROW_event = tab[1, 2], ROW_no_event = tab[1, 1],
             odds_ratio = ft$estimate,
             p_value = ft$p.value)
  }))
})), NULL)
#>    country      condition event no_event ROW_event ROW_no_event odds_ratio
#> 1   Canada Abdominal pain     0        2         2            6   0.000000
#> 2   France Abdominal pain     0        2         2            6   0.000000
#> 3  Hungary Abdominal pain     1        1         1            7   5.291552
#> 4       UK Abdominal pain     1        1         1            7   5.291552
#> 5      USA Abdominal pain     0        2         2            6   0.000000
#> 6   Canada      Agitation     0        2         2            6   0.000000
#> 7   France      Agitation     0        2         2            6   0.000000
#> 8  Hungary      Agitation     1        1         1            7   5.291552
#> 9       UK      Agitation     0        2         2            6   0.000000
#> 10     USA      Agitation     1        1         1            7   5.291552
#> 11  Canada    Anaphylaxis     0        2         1            7   0.000000
#> 12  France    Anaphylaxis     1        1         0            8        Inf
#> 13 Hungary    Anaphylaxis     0        2         1            7   0.000000
#> 14      UK    Anaphylaxis     0        2         1            7   0.000000
#> 15     USA    Anaphylaxis     0        2         1            7   0.000000
#> 16  Canada       Headache     1        1         1            7   5.291552
#> 17  France       Headache     0        2         2            6   0.000000
#> 18 Hungary       Headache     0        2         2            6   0.000000
#> 19      UK       Headache     1        1         1            7   5.291552
#> 20     USA       Headache     0        2         2            6   0.000000
#> 21  Canada       Vomiting     1        1         2            6   2.645752
#> 22  France       Vomiting     1        1         2            6   2.645752
#> 23 Hungary       Vomiting     0        2         3            5   0.000000
#> 24      UK       Vomiting     0        2         3            5   0.000000
#> 25     USA       Vomiting     1        1         2            6   2.645752
#>      p_value
#> 1  1.0000000
#> 2  1.0000000
#> 3  0.3777778
#> 4  0.3777778
#> 5  1.0000000
#> 6  1.0000000
#> 7  1.0000000
#> 8  0.3777778
#> 9  1.0000000
#> 10 0.3777778
#> 11 1.0000000
#> 12 0.2000000
#> 13 1.0000000
#> 14 1.0000000
#> 15 1.0000000
#> 16 0.3777778
#> 17 1.0000000
#> 18 1.0000000
#> 19 0.3777778
#> 20 1.0000000
#> 21 1.0000000
#> 22 1.0000000
#> 23 1.0000000
#> 24 1.0000000
#> 25 1.0000000

^{Created on 2022-04-14 by the reprex package (v2.0.1)}

How can I define variables as a comparator group including all variables except X?

Question

1 answers

solution1
1 2022-04-14 10:59:29

How can I define variables as a comparator group including all variables except X?

Question

1 answers

solution1 1 2022-04-14 10:59:29

solution1
1 2022-04-14 10:59:29