Compute proportion of outcome from repeated measures design

Question

I have a table in the following format:

CowId    Result          IMI
1        S. aureus       1
1        No growth       0
2        No growth       0
2        No growth       0
3        E. coli         1
3        No growth       0
3        E. coli         0
4        Bacillus sp.    1
4        Contaminated    0

From this table, I would like to compute the proportion of CowIds that are negative for an IMI (0 = negative; 1 = positive) at all sampling time points.

In this example, 25% of cows [CowId = 2] tested negative for an IMI at all sampling time points.

To compute this proportion, my initial approach was to group each CowId, then compute the difference between the number of negative IMIs and the total number of IMI tests, where a resulting value of 0 would indicate that the cow was negative for an IMI at all time points.

As of now, my code computes this for each individual CowId. How can I augment this to compute the proportion described above?

fp %>%
  filter(Result != "Contaminated") %>%
  group_by(CowId) %>%
  summarise(negative = (sum(IMI == 0) - length(IMI)))

Answer 1

We can count how many CowId 's have tested negative at all points and calculate their ratio.

library(dplyr)

fp %>%
  filter(Result != "Contaminated") %>%
  group_by(CowId) %>%
  summarise(negative = all(IMI == 0)) %>%
  summarise(total_percent = mean(negative) * 100)

# total_percent
#          <dbl>
#1            25

In base R, we can use aggregate

temp <- aggregate(IMI~CowId, subset(fp, Result != "Contaminated"), 
                  function(x) all(x == 0))

mean(temp$IMI) * 100

data

fp <- structure(list(CowId = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L), 
Result = structure(c(5L, 4L, 4L, 4L, 3L, 4L, 3L, 1L, 2L), .Label = 
c("Bacillus_sp.","Contaminated", "E.coli", "No_growth", "S.aureus"), 
class = "factor"),IMI = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), 
class = "data.frame", row.names = c(NA, -9L))

Answer 2

With data.table

library(data.table)
setDT(fp)[Result != "Contaminated", .(negative = all(IMI == 0)), 
      .(CowId)][, .(total_percent = mean(negative)* 100 )]
#   total_percent
#1:            25

data

fp <- structure(list(CowId = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L), 
Result = structure(c(5L, 4L, 4L, 4L, 3L, 4L, 3L, 1L, 2L), .Label = 
c("Bacillus_sp.","Contaminated", "E.coli", "No_growth", "S.aureus"), 
class = "factor"),IMI = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), 
class = "data.frame", row.names = c(NA, -9L))

Compute proportion of outcome from repeated measures design

Question

2 answers

solution1
0 ACCPTED 2020-03-29 07:03:37

solution2
0 2020-03-29 18:16:24

data

Compute proportion of outcome from repeated measures design

Question

2 answers

solution1 0 ACCPTED 2020-03-29 07:03:37

solution2 0 2020-03-29 18:16:24

data

solution1
0 ACCPTED 2020-03-29 07:03:37

solution2
0 2020-03-29 18:16:24