简体   繁体   中英

Fastest way to union all

I am searching for the fastest way to union all 100000 list into a dataframe. The union all is not a do.call(rbind) problem because i want to put in one column the output and add the minimum of each list in a group( to better understand the output see my code below).

I have tried two different stuff that works but are pretty slow, so i am searching for something using data.table or dplyr or anything that will make it better .

Example to reproduce what i want :

a <- c(1:3) 
b <-  c(12:20)
relations <- list(a,b)

output with two different solution that i tried.

1 - solution basically concatenate dataframes with rbind looping on the elements of the list :

full_group <- NULL
    for(i in 1:length(relations))
  {
    full_group = rbind( full_group,
                data.frame( id = relations[[i]] , 
                group = min( relations[[i]])) )       
                print(i)        
}

2 solution : concatenate vectors and then create aa dataframe out of the results:

full_group <- NULL
groups <- NULL
id <- NULL
    for(i in 1:length(relations))
  {

id <- c(id , relations[[i]] ) 
groups <- c( groups , rep( min(relations[[i]]) , length(relations[[i]]) ) )
                print(i)        
}

 full_group = data.frame( id = id , 
                groups = groups ) 

Judging by your second solution output, you want what stack does to lists

stack(setNames(relations,sapply(relations,min)))
values ind
1       1   1
2       2   1
3       3   1
4      12  12
5      13  12
6      14  12
7      15  12
8      16  12
9      17  12
10     18  12
11     19  12
12     20  12

The call the setNames here sets the names for the groups, here the minimum element of each list. The same code works with melt from reshape2 in place of stack , which as @akrun points out may be faster.

Stack and melt, however, will store the group as a factor and character, respectively. If a numeric is desired (probably, here), use a slight modification of its underlying code

stack2 <- function(x,i) data.frame(values=unlist(x), ind=rep.int(i, lapply(x, length)))

stack2(relations,sapply(relations,min))

This is as @alexis_laz was suggesting in the comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM