简体   繁体   中英

R - Conditionally summarize data from all possible column pairs

I have a table that lists the presence/absence of each organism across several different conditions. My goal is to generate a new table that lists the values for all possible Venn Diagrams for each pair of organisms.

...put another way: for each pair of organisms, I want a table summarizing:

  1. the number of conditions that they share (organism1 == 1 & organism2 == 1)
  2. the number of conditions unique to organism1 (organism1 == 1 & organism2 == 0)
  3. the number of conditions unique to organism2 (organism1 == 0 & organism2 == 1)

My current method is below, though my real Presence/Absence table is much larger, so it'd be great if there's a more concise way to automate this! (ie a for-loop?!)

Example Presence/Absence Table (rows=conditions, columns=organisms):

paData <- data.table(
  Pyro = c(1,1,0,0,1,0,1),
  Anth = c(0,1,0,1,0,1,1),
  Tric = c(1,1,0,1,0,1,1))
 
paData
   Pyro Anth Tric
1:    1    0    1
2:    1    1    1
3:    0    0    0
4:    0    1    1
5:    1    0    0
6:    0    1    1
7:    1    1    1

For each pair of organisms (columns) designate whether one, both, or neither organism was present in each condition (row):

paData$PyroAnth <- ifelse(paData[,1] ==1 & 
                            paData[,2] ==0, "V1alone",
                        ifelse(paData[,1] ==1 & 
                                 paData[,2] ==1, "Overlap",
                               ifelse(paData[,1] ==0 & 
                                        paData[,2] ==1, "V2alone", 
                                            "NA")))

paData$PyroTric <- ifelse(paData[,1] ==1 & 
                           paData[,3] ==0, "V1alone",
                       ifelse(paData[,1] ==1 & 
                                paData[,3] ==1, "Overlap",
                              ifelse(paData[,1] ==0 & 
                                       paData[,3] ==1, "V2alone", 
                                     "NA")))

paData$AnthTric <- ifelse(paData[,2] ==1 & 
                           paData[,3] ==0, "V1alone",
                         ifelse(paData[,2] ==1 & 
                                  paData[,3] ==1, "Overlap",
                                ifelse(paData[,2] ==0 & 
                                         paData[,3] ==1, "V2alone", 
                                       "NA")))

paData
   Pyro Anth Tric PyroAnth PyroTric AnthTric
1:    1    0    1  V1alone  Overlap  V2alone
2:    1    1    1  Overlap  Overlap  Overlap
3:    0    0    0       NA       NA       NA
4:    0    1    1  V2alone  V2alone  Overlap
5:    1    0    0  V1alone  V1alone       NA
6:    0    1    1  V2alone  V2alone  Overlap
7:    1    1    1  Overlap  Overlap  Overlap

Create desired output table -- Count the number of conditions (rows) where, for each pair of organisms; each organism was present either "alone" or where its presence "overlapped" with the presence of the second organism.

DesiredOutput <- data.frame(rbind(list(names(paData[,1]), names(paData[,2]),
                                       nrow(paData[PyroAnth == "V1alone"]),
                                       nrow(paData[PyroAnth == "Overlap"]),
                                       nrow(paData[PyroAnth == "V2alone"])),
                                  list(names(paData[,1]), names(paData[,3]),
                                       nrow(paData[PyroTri == "V1alone"]),
                                       nrow(paData[PyroTri == "Overlap"]),
                                       nrow(paData[PyroTri == "V2alone"])),
                                  list(names(paData[,2]), names(paData[,3]),
                                       nrow(paData[AnthTri == "V1alone"]),
                                       nrow(paData[AnthTri == "Overlap"]),
                                       nrow(paData[AnthTri == "V2alone"]))))

colnames(DesiredOutput) <- c("V1", "V2", "V1alone", "Overlap", "V2alone")

DesiredOutput
    V1   V2 V1alone Overlap V2alone
1 Pyro Anth       2       2       2
2 Pyro Tric       1       3       2
3 Anth Tric       0       4       1

How could this be automated to efficiently create my "DesiredOutput" table for dozens of organisms and hundreds of conditions?

You could try this approach:

f <- function(v1,v2) list(sum(v1 & !v2),sum(v1 & v2),sum(!v1 & v2))

result = data.table(t(combn(names(paData),2)))

result[, c("v1alone", "overlap", "v2alone"):=f(paData[[V1]], paData[[V2]]), by=1:nrow(result)]

Output:

     V1   V2 v1alone overlap v2alone
1: Pyro Anth       2       2       2
2: Pyro Tric       1       3       2
3: Anth Tric       0       4       1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM