简体   繁体   中英

Merging a list of list, keeping only the elements that are not present higher-order-list in R

Ok, this might be a bit hard to explain but hang with me:

suppose I have a list of lists, which consists of the same elements, although fewer and fewer as we "level up" in the grouping of elements:

level.list <-  list(
list(1,2,3,4,5,6,7,8,9,10,11,12,13,14), # base level 
list(c(1,2,3),c(4,5),c(6,7),c(13,14)),     # level 2 groups 
list(c(1,2,3,6,7),c(4,5,9)),      # level 3 groups    
list(c(4,5,9,12))    # level 4 groups 
)

so, each list in the list contains some of the elements from the list before, merging them in larger groups.

The thing is, if a group in a list isn''t present in the "higher level" list, then that group is the final one. If an element is present in a higher level list, the group of elements that is merged at level 2 [6,7] is merged with the group of elements [1,2,3] at level 3, then the level 2 group containing [6,7] and the other level 2 group containing [1,2,3] should not be part of the final list, since both are present in the shared group [1,2,3,6,7], and this is given priority.

The lists elements are indexes in a dataset, that groups the observations at higher and higher levels. So in effect, this is "a halfway done" list that creates a grouping variable.

I simply don't know how to go about this, finding a way to merge the first list the list with the other lists, removing "lower order" groupings in the lists. so I get a matrix/df, that contains the "highest level" an element is in, as well as a second number that tells me which group a given element is in at that level, so the matrix/df should be this:

group.matrix <- matrix(c(
1     , "3.1" ,
2     , "3.1" ,
3     , "3.1" ,
4     , "4.1" ,
5     , "4.1" ,
6     , "3.1" ,
7     , "3.1" ,
8     , "1.1" ,
9     , "4.1" ,
10    , "1.2" ,
11    , "1.3" ,
12    , "4.1" ,
13    , "2.2" , 
14    , "2.2" 
          ), 
           nrow = 14, ncol = 2, byrow = TRUE)
colnames(group.matrix) <- c("first.level","group")

Here, the elements are somewhat ordered, this is not the case in my real life data. Hope my question is clear to you. And that you can help me! I have two weeks to hand in my masterthesis, and I''m simply in over my head with this problem, but I need to solve this in order to analyse something essential in the thesis :/ .

Thank you for your time.

EDIT: have about the question and the toy example accordingly

Here's a solution using base functions

at_levels <- Map(function(i, x) cbind(i=i, x=unlist(x)), seq_along(level.list), level.list)
aggregate(i~x, do.call("rbind", at_levels), max)

#     x i
# 1   1 3
# 2   2 3
# 3   3 3
# 4   4 4 
# 5   5 4
# 6   6 3
# 7   7 3
# 8   8 1
# 9   9 4
# 10 10 1
# 11 11 1
# 12 12 4
# 13 13 2
# 14 14 2

Basically I use Map() to track which level each number appears (allowing duplicates), then I use aggregate() to find the max level for each value. This may not the be the most efficient method if you have millions of rows or something, but it should be pretty straightforward to understand.

Another attempt using max.col and mapply along with %in% to do the grunt work of checking if a value is in a higher level:

max.col(mapply(`%in%`, level.list[1], lapply(level.list, unlist)), "last")
#[1] 3 3 3 4 4 3 3 1 4 1 1 4 2 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM