简体   繁体   中英

How do I find the intersection of lists of lists in R?

Say I have three lists,

> a
[[1]]
     begin end
     3     5
     9     10
     11    14

[[2]]
     begin end
     3     7
     14    18
     19    24

[[3]]
     begin end
     6     9
     14    22
     18    30

What I am trying to find is the intersection of all of the "begin" columns, so in this case the desired output would be something like

"3" "14"

I am aware of the solution offered at How to find common elements from multiple vectors? ; however, this solution assumes that the number of lists is static. If the number of lists I have here were to change (say, to 5 lists, each one with similar columnar layout), how would I find the intersection?

An easy way is to collapse the list elements and use table to count them

# Recreate the data frame
a <- list(
    data.frame(begin = c(3, 9, 11), end = c(5, 10, 14)),
    data.frame(begin = c(3, 14, 19), end = c(7, 18, 24)),
    data.frame(begin = c(6, 14, 18), end = c(9, 22, 30)))

# "Collapse" the begin columns into a vector.
# We use unlist in case the data frames are not all 
# of the same length(thanks @Frank for pointing this out)
a.beg <- unlist(sapply(a, function(x){x$begin}))

# Count the elements
tb <- table(a.beg)

# Get the ones repeated at least twice 
# (need to cast to numeric as names are strings)
intersection <- as.numeric(names(tb[tb>=2]))

> intersection
[1]  3 14

Using @nico's input data...

full <- do.call(rbind, lapply(seq_along(a), function(i) within(a[[i]], {g = i})) )

res  <- table(full[,c("begin","g")])

#      g
# begin 1 2 3
#    3  1 1 0
#    6  0 0 1
#    9  1 0 0
#    11 1 0 0
#    14 0 1 1
#    18 0 0 1
#    19 0 1 0

The rows are the unique values of begin and the columns are the elements of the list. To see which values of begin appear in more than one element of the list, look at

res[ rowSums( res>0 ) > 1, ]
#      g
# begin 1 2 3
#    3  1 1 0
#    14 0 1 1

Probably whatever further analysis you have to do should be done on full rather than on your list of data.frames, especially if efficiency is a concern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM