[英]Perform Operations on the same day using XTS in R
I have what is hopefully a straightforward question. 我希望有一个简单的问题。 I have an
xts
object somewhat similar to the following: 我有一个
xts
对象,该对象类似于以下内容:
| MarketPrice |
----------------------------------------
2007-05-04 10:15:33.546 | 5.32 |
----------------------------------------
2007-05-04 10:16:42.100 | 5.31 |
----------------------------------------
2007-05-04 10:17:27.546 | NA |
----------------------------------------
2007-05-04 10:20:50.871 | 5.35 |
----------------------------------------
2007-05-04 10:21:38.652 | 5.37 |
Basically, I would like to find the MarketPrice
index immediately before a time while also ommitting NA
values. 基本上,我想在
MarketPrice
之前找到MarketPrice
指数,同时也忽略NA
值。 Let's say for instance we start at the time 2007-05-04 10:20:50.871
which has an index of 4 in the object. 假设我们从2007年
2007-05-04 10:20:50.871
4 2007-05-04 10:20:50.871
开始,该对象在对象中的索引为4。 So this means that the Market Price immediately before this time is 5.31
which has an index of 2 in the object. 因此,这意味着紧接此时间之前的市场价格为
5.31
,该对象中的索引为2。 In order to perform this task I have written up a function similar to the following: 为了执行此任务,我编写了类似于以下内容的函数:
MPFunction <- function(t,df){
ind <- t
while(t>1){
t=t-1
if ( (index(df[t]) != index(df[ind])) && !(is.na(df[t,"MarketPrice"]))) {
return(t)
}
}
}
And this performs the task since the first condition in the IF statement checks to make sure the times in the index of the xts
object are different and the second condition checks to make sure there is no NA
value in the MarketPrice
column. 由于IF语句中的第一个条件检查以确保
xts
对象的索引中的时间不同,并且第二个条件检查以确保MarketPrice
列中没有NA
值,因此这将执行MarketPrice
。
However, I now run into an issue when I look at several days. 但是,当我查看几天时,现在遇到了一个问题。 Let's say I now have an
xts
object as follows: 假设我现在有一个
xts
对象,如下所示:
| MarketPrice |
----------------------------------------
2007-05-03 16:59:58.921 | 5.32 |
----------------------------------------
2007-05-04 10:12:27.546 | NA |
----------------------------------------
2007-05-04 10:20:50.871 | 5.35 |
----------------------------------------
If I start at index 3 (ie at the time 2007-05-04 10:20:50.871
) then if I wish to find the first index before this time that doesn't have an NA
value in the MarketPrice
column, it will go to index 1 which is 2007-05-03 16:59:58.921
. 如果我从索引3开始(即在时间
2007-05-04 10:20:50.871
),那么如果我希望在此之前找到在MarketPrice
列中没有NA
值的第一个索引,它将到索引1是2007-05-03 16:59:58.921
。 The problem however is that this is on a different day, and I want to make sure that I only extract the index of MarketPrice
values on the same day. 但是,问题在于这是在另一天,并且我想确保只在同一天提取
MarketPrice
值的索引。
Basically, I was wondering if there is a quick modification I can make to my MPFunction
above in the IF statement which will allow me to avoid finding the index of the MarketPrice from the previous day. 基本上,我想知道是否可以在IF语句中对我的
MPFunction
进行快速修改,从而避免从前一天开始查找MarketPrice的索引。 Also, I do not wish to split the xts
object up by day, since it would complicate things quite a bit if I did. 另外,我也不希望按天划分
xts
对象,因为如果这样做的话,会使事情复杂化很多。
Now, I already have several idea on how to solve this (such as using the strptime
function to check dates etc.) but these are all time-consuming methods, so I was hoping to find a method which is much much faster, so if anyone has any ideas I'd appreciate it. 现在,我已经对如何解决这个问题有了一些想法(例如,使用
strptime
函数检查日期等),但是这些都是费时的方法,因此我希望找到一种更快的方法,因此任何人都有任何想法,我将不胜感激。 Thanks in advance. 提前致谢。
Sounds like you actually want to use split.xts
(why is using split a complication? It shouldn't be, even with large amounts of tick data in each day), and recombine the results: 听起来好像您实际上是想使用
split.xts
(为什么要使用split进行复杂处理?即使每天都有大量的滴答数据,也不应该这样),然后重新组合结果:
zz=xts(order.by = as.POSIXct(c("2007-05-03 09:59:58.921",
"2007-05-03 10:03:58.921",
"2007-05-03 12:03:58.921"
"2007-05-04 10:15:33.546",
"2007-05-04 10:16:42.100",
"2007-05-04 10:17:27.546",
"2007-05-04 10:20:50.871",
"2007-05-04 10:21:38.652")),
x = c(3, 4, 9, 5.32, 5.31, NA, 5.35, 5.37), dimnames = list(NULL, "MarketPrice"))
> zz
# MarketPrice
# 2007-05-03 09:59:58 3.00
# 2007-05-03 10:03:58 4.00
# 2007-05-04 10:15:33 5.32
# 2007-05-04 10:16:42 5.31
# 2007-05-04 10:17:27 NA
# 2007-05-04 10:20:50 5.35
# 2007-05-04 10:21:38 5.37
MPFunction <- function(x, time_window = "T10/T10:16:40") {
#last(x[time_window, which.i= TRUE]) # get the index?
# last returns the last row in the group selected:
#last(x[time_window,])
u <- x[time_window, which.i = TRUE]
if (length(u) > 0) {
# Get index which is not an NA value:
u.na <- which(is.na(x[time_window, "MarketPrice"]))
u2 <- u[!u %in% u.na]
if (length(u2) > 0) {
v <- xts(order.by = end(x[last(u2)]), x = last(u2), dimnames = list(NULL, "index.i"))
} else {
v <- NULL
}
} else {
v <- NULL
}
v
}
# use T0/ as the start of the time window in each day for getting the index value by default. You can change this though.
chosen_window = "T0/T10:17:29"
by_day <- lapply(split(zz, f = "day"), FUN = MPFunction, time_window = chosen_window)
rr <- do.call(rbind, by_day)
> rr
# index.i
# 2007-05-03 10:03:58 2
# 2007-05-04 10:16:42 2
If there are no values in a day in the time_window
of interest, you will get NULL
for that day, and nothing returned in the output ( rr
) for that day 如果感兴趣的
time_window
中一天中没有任何值,则该天您将获得NULL
,而该天的输出( rr
)中将不返回任何内容
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.