简体   繁体   English

R:合并两个不规则时间序列

[英]R: merge two irregular time series

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date).我有两个多元时间序列 x 和 y,它们的时间范围大致相同(一个比另一个早两年开始,但它们在同一日期结束)。 Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.两个系列都缺少日期列旁边的空列形式的观察结果,并且从某种意义上说,其中一个系列有多个日期在另一个系列中找不到,反之亦然。

I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates.我想创建一个数据框(或类似的),其中有一列列出了在 x 或 y 中找到的所有日期,没有重复的日期。 For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells.对于每个日期(行),我想将 x 的观察值水平堆叠在 y 的观察值旁边,NA 填充缺失的单元格。 Example:例子:

>x
"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9

>y
"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34

# result I would like
"1987-01-01"   7.1    NA   3   55.3   66   45
"1987-01-02"   5.2    5    2   NA     NA   NA
"1987-01-03"   NA     NA   NA  77.3   87   34
"1987-01-06"   2.3    NA   9   NA     NA   NA

What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, eg "1987-01-02" shown as 6210) from each series appearing in two separate columns.我尝试过的:使用动物园 package,我尝试了 merge.zoo 方法,但这似乎只是将两个系列堆叠在一起,日期(作为数字,例如“1987-01-02”显示作为 6210)来自每个系列,出现在两个单独的列中。

I've sat for hours getting almost nowhere, so all help is appreciated.我已经坐了几个小时几乎一无所获,所以所有的帮助都是值得的。

EDIT: some code included below as per suggestion from Soumendra编辑:根据 Soumendra 的建议,下面包含一些代码

atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)

# only number of observations and the observations themselves differ 
>str(atcoa)
'data.frame':   6151 obs. of  8 variables:
 $ date        :Class 'Date'  num [1:6151] 6210 6213 6215 6216 6217 ...
 $ max         : num  4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
 $ min         : num  4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
 $ close       : num  4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
 $ avg         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tot.vol     : int  877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
 $ turnover    : num  3762300 388900 3835900 8015900 12468100 ...
 $ transactions: int  12 9 24 17 31 26 34 35 37 33 ...

>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02  4.31  4.28    4.31    NA    877733    3762300             12

# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")

>str(ts.atcoa)
Time Series:          
 Name:               object
Data Matrix:        
 Dimension:          6151 7
 Column Names:       a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Row Names:          1970-01-01 01:43:30  ...  1970-01-01 04:12:35
Positions:          
 Start:              1970-01-01 01:43:30
 End:                1970-01-01 04:12:35
With:               
 Format:             %Y-%m-%d %H:%M:%S
 FinCenter:          GMT
 Units:              a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Title:              Time Series Object
 Documentation:      Wed Aug 17 13:00:50 2011

>ts.atcoa[1:1, ]
GMT
 a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 1970-01-01 01:43:30  4.31  4.28    4.31    NA    877733    3762300             12

# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob)  # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25  7.92  7.77    7.79    NA    269172    2119400             19  7.69  7.56    7.64    NA  81176693  593858000             12

EDIT: problem solved by (using zoo package)编辑:通过(使用动物园包)解决的问题

atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))

names(atcoa) <- c("a.max", "a.min", "a.close",
                   "a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
                   "b.avg", "b.tot.vol", "b.turnover", "b.transactions")

atco <- merge.zoo(atcoa, atcob)

Thank you all for your help.谢谢大家的帮助。

Try this:尝试这个:

Lines.x <- '"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9'

Lines.y <- '"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34'

library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x,  y)

giving:给予:

           V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01  7.1   NA    3 55.3   66   45
1987-01-02  5.2    5    2   NA   NA   NA
1987-01-03   NA   NA   NA 77.3   87   34
1987-01-06  2.3   NA    9   NA   NA   NA

You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.您可以根据您的日期创建一个 timeSeries(timeSeries 库)object,合并它们(timeSeries 默认合并行为与 zoo 和 xts 不同,并且完全符合您的要求),然后从结果中生成 zoo/xts 对象,以防万一不想和 timeSeries 呆在一起。

One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -一种快速测试方法如下,假设您有两个动物园对象 zz1 和 zz2 -

library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))

Compare the output of the above command with将上述命令的 output 与

merge(zz1, zz2)

You can also cbind -你也可以 cbind -

cbind(zz1, zz2)

provided there are no shared columns with same names.前提是没有同名的共享列。 Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.即使有这样的列,你也可以选择你绑定的列,你会得到一个动物园 object。

cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common

here, i found a more generic aproach from stat.ethz.ch在这里,我从 stat.ethz.ch 找到了一个更通用的方法

a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)

library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))

to get a ts object back:得到一个 ts object 回来:

as.ts(m)

How about this:这个怎么样:

## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))

## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)

## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)

## Build new zoo object.
d <- zoo(v, order.by=i)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM