Hello everyone, I am working with large list, which contains lists. Each of the sub lists contains n elements. I always want to get the 3rd one, eg
l = list()
l[[1]] = list(A=runif(1), B=runif(1), C=runif(1))
l[[2]] = list(A=runif(1), B=runif(1), C=runif(1))
l[[3]] = list(A=runif(1), B=runif(1), C=runif(1))
res = sapply(l, function(x) x$C)
res = sapply(l, function(x) x[[3]]) #alternative
But my list contains several thousands of elements and I am performing this operation a lot of times. So, is there a faster way to do the operation above?
Beste regards,
Mario
If you do this mulitple times, then it would be better to convert your list to an easier structure like data.table
.
library(data.table)
DT=rbindlist(l);
res = DT$C
# or if you prefer the 3rd element, not necessarily called 'C' then:
res = DT[[3]] # or DT[,C] which might be faster. Please check @richard-scriven comment
Alternatively if you want to keep base R you could use rbind
res = do.call(rbind.data.frame, l)$C # or [[3]]
Would this make things easier?
UPDATE
Here are some benchmarks showing different solutions to the problem:
preparations:
library(data.table)
library(microbenchmark)
# creating a list and filling it with items
nbr = 1e5;
l = vector("list",nbr)
for (i in 1:nbr) {
l[[i]] = list(A=runif(1), B=runif(1), C=runif(1))
}
# creating data.frame and data.table versions
DT <- rbindlist(l)
DF <- data.frame(rbindlist(l))
benchmarking:
# doing the benchmarking
op <-
microbenchmark(
LAPPLY.1 = lapply(l, function(x) x$C),
LAPPLY.2 = lapply(l, `[`, "C"),
LAPPLY.3 = lapply(l, `[[`, "C"),
SAPPLY.1 = sapply(l, function(x) x$C),
SAPPLY.2 = sapply(l, function(x) x[[3]]),
SAPPLY.3 = sapply(l, `[[`, 3),
DT.1 = rbindlist(l)$C,
DT.2 = DT$C,
DF.2 = DF$C,
times = 100
)
results:
op
## Unit: microseconds
## expr min lq mean median uq max neval
## LAPPLY.1 124088 142390 161672 154415 163240 396761 100
## LAPPLY.2 111397 134745 156012 150062 165229 364539 100
## LAPPLY.3 66965 71608 82975 77329 84949 323041 100
## SAPPLY.1 133220 149093 166653 159222 172495 311857 100
## SAPPLY.2 105917 119533 137990 133364 139216 346759 100
## SAPPLY.3 70391 74726 81910 80520 85792 110062 100
## DT.1 46895 48943 49113 49178 49391 51377 100
## DT.2 8 18 37 47 49 58 100
## DF.2 7 13 33 40 42 82 100
(1) In general it would be best to use a table like structure like data.frame or data.table in the first place - selecting columns from those costs the least of time.
(2) If this is not possible it is better to first turn the list into a data.frame or data.table to than extract the values in one single operation.
(3) Interestingly using sapply or lapply with the base R (optimized) [[
-function results in process times that are only twice as bad as using rbind and than extracting the values as column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.