简体   繁体   中英

Compute proportion of outcome from repeated measures design

I have a table in the following format:

CowId    Result          IMI
1        S. aureus       1
1        No growth       0
2        No growth       0
2        No growth       0
3        E. coli         1
3        No growth       0
3        E. coli         0
4        Bacillus sp.    1
4        Contaminated    0

From this table, I would like to compute the proportion of CowIds that are negative for an IMI (0 = negative; 1 = positive) at all sampling time points.

In this example, 25% of cows [CowId = 2] tested negative for an IMI at all sampling time points.

To compute this proportion, my initial approach was to group each CowId, then compute the difference between the number of negative IMIs and the total number of IMI tests, where a resulting value of 0 would indicate that the cow was negative for an IMI at all time points.

As of now, my code computes this for each individual CowId. How can I augment this to compute the proportion described above?

fp %>%
  filter(Result != "Contaminated") %>%
  group_by(CowId) %>%
  summarise(negative = (sum(IMI == 0) - length(IMI)))

We can count how many CowId 's have tested negative at all points and calculate their ratio.

library(dplyr)

fp %>%
  filter(Result != "Contaminated") %>%
  group_by(CowId) %>%
  summarise(negative = all(IMI == 0)) %>%
  summarise(total_percent = mean(negative) * 100)

# total_percent
#          <dbl>
#1            25

In base R, we can use aggregate

temp <- aggregate(IMI~CowId, subset(fp, Result != "Contaminated"), 
                  function(x) all(x == 0))

mean(temp$IMI) * 100

data

fp <- structure(list(CowId = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L), 
Result = structure(c(5L, 4L, 4L, 4L, 3L, 4L, 3L, 1L, 2L), .Label = 
c("Bacillus_sp.","Contaminated", "E.coli", "No_growth", "S.aureus"), 
class = "factor"),IMI = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), 
class = "data.frame", row.names = c(NA, -9L))

With data.table

library(data.table)
setDT(fp)[Result != "Contaminated", .(negative = all(IMI == 0)), 
      .(CowId)][, .(total_percent = mean(negative)* 100 )]
#   total_percent
#1:            25

data

fp <- structure(list(CowId = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L), 
Result = structure(c(5L, 4L, 4L, 4L, 3L, 4L, 3L, 1L, 2L), .Label = 
c("Bacillus_sp.","Contaminated", "E.coli", "No_growth", "S.aureus"), 
class = "factor"),IMI = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), 
class = "data.frame", row.names = c(NA, -9L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM