简体   繁体   中英

R: obtain single data frame from list of zoo-objects

I have a list of zoo-bjects consisting of irregular time-series, lodf, in the following format:

> head(lodf)

[[1]]
2014-08-08 2014-08-14 2014-09-12  
  1.15       1.32       2.39 

[[2]]  
2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12 
  0.50       0.75       1.29       1.36       1.28       1.28   

[[3]]
2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01 
  1.00       1.27       0.91       1.00       0.99       0.98 

...

I am ultimately trying to sum all these time-series into one combined time-series, ie sum down each column. To do this, I am trying to convert into a zoo/xts time-series for further manipulation , ie to apply na.locf and other zoo-library capabilities before summing across the individual data frames/dates using rowsum. ie I am trying to get my list of date frames above into a combined zoo object resembling this:

           Value
12/09/2014  1.07
14/08/2014  1.32
08/08/2014  1.15
12/09/2014  0.48
27/08/2014  0.53
20/08/2014  0.61
14/08/2014  0.54
24/07/2014  0.75
22/07/2014  0.5
01/07/2014  0.98
01/07/2014  0
...

There is often over-lap between the individual data frames ie several values corresponding to the same date index, and What I would like to do in those cases is to sum the values. Eg if I have

012-11-01
  0.7

012-11-01
 1.5

012-11-01
 0.7

I would like to have

012-11-01
2.9

as the value for this date index in the resulting large data frame.

I have tried merge, reading as a zoo object, do.call(rbind) etc. in the current format, but I am stumped. For further context, this question is part of a larger project outlined here: R: time series with duplicate time index entries . Any help would be most appreciated!

Update: please find a data object below as requested:

> dput(head(lodf))
list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290L, 
16296L, 16325L), class = "Date"), class = "zoo"), structure(c(0.5, 
0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273L, 16275L, 
16296L, 16302L, 16309L, 16325L), class = "Date"), class = "zoo"), 
structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645L, 
15646L, 15898L, 15930L, 15959L, 16252L), class = "Date"), class = "zoo"), 
structure(c(1.27, 1.29, 1.28, 1.17, 0.59, 0), index = structure(c(15645L, 
15651L, 15665L, 15679L, 15686L, 15747L), class = "Date"), class = "zoo"), 
structure(c(1.9, 1.35, 0.66, 1.16, 0.66, 1.16, 1.26, 1.23, 
1.28, 1.23, 1.17, 0.66, 1.18, 0.66, 1.29, 1.35, 1.45, 1.53, 
1.61, 1.82, 1.8, 1.89, 1.8, 1.81, 1.78, 1.68, 2.18, 1.68, 
1.56, 1.93, 1.84, 1.69, 1.18, 1.73, 1.18, 1.72, 1.83, 1.9, 
1.99, 1.93, 1.87, 1.96, 2.1, 2.22, 2.33, 2.38, 2.35, 2.23, 
2.16, 2.18, 2.17, 2.2, 2.29, 2.27, 2.28, 2.42, 2.48, 2.99, 
2.56, 2.65, 2.69, 3.21, 2.7, 2.8, 2.79, 2.8, 2.78, 2.26, 
2.78, 2.26, 2.12, 2.07, 1.97, 1.84, 1.77, 1.18, 1.7, 1.78, 
1.91, 1.98, 1.93, 1.83, 1.76, 1.18, 1.01, 0.97, 0.86, 0.69, 
0.56), index = structure(c(15645L, 15652L, 15660L, 15740L, 
15797L, 15841L, 15860L, 15867L, 15876L, 15887L, 15890L, 15897L, 
15901L, 15905L, 15908L, 15909L, 15910L, 15911L, 15915L, 15926L, 
15931L, 15932L, 15938L, 15953L, 15954L, 15975L, 15978L, 15979L, 
15981L, 15982L, 15985L, 15986L, 15987L, 16001L, 16003L, 16006L, 
16008L, 16010L, 16014L, 16016L, 16021L, 16022L, 16023L, 16027L, 
16029L, 16031L, 16045L, 16052L, 16059L, 16072L, 16077L, 16078L, 
16084L, 16091L, 16098L, 16100L, 16101L, 16106L, 16132L, 16133L, 
16134L, 16139L, 16146L, 16150L, 16153L, 16157L, 16160L, 16163L, 
16167L, 16169L, 16170L, 16171L, 16175L, 16177L, 16182L, 16184L, 
16212L, 16216L, 16220L, 16224L, 16248L, 16254L, 16258L, 16261L, 
16297L, 16301L, 16309L, 16310L, 16317L), class = "Date"), class = "zoo"), 
structure(c(3.35, 3.44, 3.41, 3.14, 3.11, 2.55, 2.65, 2.87, 
3.14, 3.24, 3.41, 4.04, 4.19, 4.34, 4.44, 1.2, 1.3, 1.29, 
1.3, 1.27, 0.77, 0.69, 0.55, 0), index = structure(c(15645L, 
15650L, 15694L, 15740L, 15741L, 15742L, 15743L, 15749L, 15750L, 
15751L, 15755L, 15756L, 15758L, 15762L, 15784L, 15800L, 15805L, 
15810L, 15824L, 15835L, 15838L, 15840L, 15847L, 15849L), class = "Date"), class = "zoo"))
> 

The input displayed at the top of the question appears to be the first three components of the input specified at the bottom of the question. The variable name used at the bottom of the question, lodf , seems to suggest that it contains a list of data frames but in fact it contains a list of zoo objects.

The question asks for a single data frame result but we are assuming that the output should be a single zoo series too, for consistency. Also we shall use the name L for the input as lodf would wrongly suggest a list of data frames. If z is the result as a zoo series then

data.frame(index = index(z), data = coredata(z))

could be used if a data frame really were desired.

In the output section near the end of this answer we show the result of using as our input L <- lodf[1:3] (ie first 3 components only) and separately show the output using L <- lodf (ie all components) as our input.

1) Reduce . We merge the zoo series in the list, L , returning a list and filling in missing values with 0. Then use Reduce to sum the components:

Reduce(`+`, do.call(merge, c(L, retclass = "list", fill = 0)))

1a) A variation of this is to return a zoo object from merge (which is the default if we do not specify retclass ), then fill in its NAs with 0, turn it back into a list and use Reduce :

Reduce(`+`, as.list(na.fill(do.call(merge, L), 0)))

2) rowSums In this solution we merge the lists to give zoo object z , optionally add column names and then add across rows producing the final zoo object.

z <- do.call(merge, L)
colnames(L) <- seq_along(L) # optionally add names
zoo(rowSums(z, na.rm = TRUE), time(z))

Note that a rowSums solution of zoo objects previously appeared here

3) + If we knew that there were exactly 3 components to the list then an alternate way to write the above would be this. We optionally add names 1, 2, 3, merge the zoo objects and fill NAs with 0. Finally we add the series together. Modify in the obvious way if the number of components differs.

z0 <- na.fill(do.call(merge, L), 0)
colnames(z0) <- 1:3 # optionally add names 1, 2, 3
z0[, 1] + z0[, 2] + z0[, 3]

Output Using L <- lodf[1:3] as displayed at the start of the question where lodf is shown at the bottom of the question our output is:

2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01 2014-07-22 
      1.00       1.27       0.91       1.00       0.99       0.98       0.50 
2014-07-24 2014-08-08 2014-08-14 2014-08-20 2014-08-27 2014-09-12 
      0.75       1.15       2.61       1.36       1.28       3.67 

or using L <- locf in the above we get the following (except for solution 3 which would have to be modified in an obvious way to use 6 rather than 3 components):

2012-11-01 2012-11-02 2012-11-06 2012-11-07 2012-11-08 2012-11-16 2012-11-21 
      7.52       1.27       3.44       1.29       1.35       0.66       1.28 
2012-12-05 2012-12-12 2012-12-20 2013-02-04 2013-02-05 2013-02-06 2013-02-07 
      1.17       0.59       3.41       4.30       3.11       2.55       2.65 
2013-02-11 2013-02-13 2013-02-14 2013-02-15 2013-02-19 2013-02-20 2013-02-22 
      0.00       2.87       3.14       3.24       3.41       4.04       4.19 
2013-02-26 2013-03-20 2013-04-02 2013-04-05 2013-04-10 2013-04-15 2013-04-29 
      4.34       4.44       0.66       1.20       1.30       1.29       1.30 
2013-05-10 2013-05-13 2013-05-15 2013-05-16 2013-05-22 2013-05-24 2013-06-04 
      1.27       0.77       0.69       1.16       0.55       0.00       1.26 
2013-06-11 2013-06-20 2013-07-01 2013-07-04 2013-07-11 2013-07-12 2013-07-15 
      1.23       1.28       1.23       1.17       0.66       0.91       1.18 
2013-07-19 2013-07-22 2013-07-23 2013-07-24 2013-07-25 2013-07-29 2013-08-09 
      0.66       1.29       1.35       1.45       1.53       1.61       1.82 
2013-08-13 2013-08-14 2013-08-15 2013-08-21 2013-09-05 2013-09-06 2013-09-11 
      1.00       1.80       1.89       1.80       1.81       1.78       0.99 
2013-09-27 2013-09-30 2013-10-01 2013-10-03 2013-10-04 2013-10-07 2013-10-08 
      1.68       2.18       1.68       1.56       1.93       1.84       1.69 
2013-10-09 2013-10-23 2013-10-25 2013-10-28 2013-10-30 2013-11-01 2013-11-05 
      1.18       1.73       1.18       1.72       1.83       1.90       1.99 
2013-11-07 2013-11-12 2013-11-13 2013-11-14 2013-11-18 2013-11-20 2013-11-22 
      1.93       1.87       1.96       2.10       2.22       2.33       2.38 
2013-12-06 2013-12-13 2013-12-20 2014-01-02 2014-01-07 2014-01-08 2014-01-14 
      2.35       2.23       2.16       2.18       2.17       2.20       2.29 
2014-01-21 2014-01-28 2014-01-30 2014-01-31 2014-02-05 2014-03-03 2014-03-04 
      2.27       2.28       2.42       2.48       2.99       2.56       2.65 
2014-03-05 2014-03-10 2014-03-17 2014-03-21 2014-03-24 2014-03-28 2014-03-31 
      2.69       3.21       2.70       2.80       2.79       2.80       2.78 
2014-04-03 2014-04-07 2014-04-09 2014-04-10 2014-04-11 2014-04-15 2014-04-17 
      2.26       2.78       2.26       2.12       2.07       1.97       1.84 
2014-04-22 2014-04-24 2014-05-22 2014-05-26 2014-05-30 2014-06-03 2014-06-27 
      1.77       1.18       1.70       1.78       1.91       1.98       1.93 
2014-07-01 2014-07-03 2014-07-07 2014-07-10 2014-07-22 2014-07-24 2014-08-08 
      0.98       1.83       1.76       1.18       0.50       0.75       1.15 
2014-08-14 2014-08-15 2014-08-19 2014-08-20 2014-08-27 2014-08-28 2014-09-04 
      2.61       1.01       0.97       1.36       2.14       0.69       0.56 
2014-09-12 

Updates Added additional solutions and re-arranged and expanded presentation.

Try (If the list elements are list of zoo objects and if you need to get the sum of the matching index).

 library(xts)
 library(zoo)

  z1 <- setNames(do.call(`merge`, lodf), paste0("Value", seq_along(lodf)))
  xts(data.frame(value=rowSums(z1, na.rm=TRUE)), order.by=index(z1))
  #            value
  #2012-11-01  1.00
  #2012-11-02  1.27
  #2013-07-12  0.91
  #2013-08-13  1.00
  #2013-09-11  0.99
  #2014-07-01  0.98
  #2014-07-22  0.50
  #2014-07-24  0.75
  #2014-08-08  1.15
  #2014-08-14  2.61
  #2014-08-20  1.36
  #2014-08-27  1.28
  #2014-09-12  3.67

If you need to use na.locf before summing

   z2 <- na.locf(z1)
   xts(data.frame(value=rowSums(z2, na.rm=TRUE)), order.by=index(z2))

data

  lodf <- list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290, 
  16296, 16325), class = "Date"), class = "zoo"), structure(c(0.5, 
  0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273, 16275, 
  16296, 16302, 16309, 16325), class = "Date"), class = "zoo"), 
  structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645, 
  15646, 15898, 15930, 15959, 16252), class = "Date"), class = "zoo"))

With base R:

lodf = list(structure(list(`014-08-08` = 1.15, `2014-08-14` = 1.32, 
    `2014-09-12` = 2.39), .Names = c("014-08-08", "2014-08-14", 
"2014-09-12"), class = "data.frame", row.names = c(NA, -1L)), 
    structure(list(`2014-07-22` = 0.5, `2014-07-24` = 0.75, `2014-08-14` = 1.29, 
        `2014-08-20` = 1.36, `2014-08-27` = 1.28, `2014-09-12` = 1.28), .Names = c("2014-07-22", 
    "2014-07-24", "2014-08-14", "2014-08-20", "2014-08-27", "2014-09-12"
    ), class = "data.frame", row.names = c(NA, -1L)), structure(list(
        `2012-11-01` = 1, `2012-11-02` = 1.27, `2013-07-12` = 0.91, 
        `2013-08-13` = 1, `2013-09-11` = 0.99, `2014-07-01` = 0.98), .Names = c("2012-11-01", 
    "2012-11-02", "2013-07-12", "2013-08-13", "2013-09-11", "2014-07-01"
    ), class = "data.frame", row.names = c(NA, -1L)))

lodf
[[1]]
  014-08-08 2014-08-14 2014-09-12
1      1.15       1.32       2.39

[[2]]
  2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12
1        0.5       0.75       1.29       1.36       1.28       1.28

[[3]]
  2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01
1          1       1.27       0.91          1       0.99       0.98


ddf = data.frame(full=character(), stringsAsFactors=F)
ll = unlist(lapply(lodf, function(x) paste(names(x), x, sep='_')))
ddf[1:length(ll),1]=ll
ddf
              full
1   014-08-08_1.15
2  2014-08-14_1.32
3  2014-09-12_2.39
4   2014-07-22_0.5
5  2014-07-24_0.75
6  2014-08-14_1.29
7  2014-08-20_1.36
8  2014-08-27_1.28
9  2014-09-12_1.28
10    2012-11-01_1
11 2012-11-02_1.27
12 2013-07-12_0.91
13    2013-08-13_1
14 2013-09-11_0.99
15 2014-07-01_0.98


ddf$date = unlist(lapply(strsplit(ddf$full, '_'),function(x)x[1]))
ddf$value = as.numeric(unlist(lapply(strsplit(ddf$full, '_'),function(x)x[2])))
ddf = ddf[,-1]
ddf
         date value
1   014-08-08  1.15
2  2014-08-14  1.32
3  2014-09-12  2.39
4  2014-07-22  0.50
5  2014-07-24  0.75
6  2014-08-14  1.29
7  2014-08-20  1.36
8  2014-08-27  1.28
9  2014-09-12  1.28
10 2012-11-01  1.00
11 2012-11-02  1.27
12 2013-07-12  0.91
13 2013-08-13  1.00
14 2013-09-11  0.99
15 2014-07-01  0.98

Finally:

aggregate(value~date, ddf, sum)
         date value
1  2012.11.01  1.00
2  2012.11.02  1.27
3  2013.07.12  0.91
4  2013.08.13  1.00
5  2013.09.11  0.99
6  2014.07.01  0.98
7  2014.07.22  0.50
8  2014.07.24  0.75
9  2014.08.08  1.15
10 2014.08.14  2.61
11 2014.08.20  1.36
12 2014.08.27  1.28
13 2014.09.12  3.67

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM