简体   繁体   English

使用R中的XTS在同一天执行操作

[英]Perform Operations on the same day using XTS in R

I have what is hopefully a straightforward question. 我希望有一个简单的问题。 I have an xts object somewhat similar to the following: 我有一个xts对象,该对象类似于以下内容:

                        | MarketPrice  |
----------------------------------------
2007-05-04 10:15:33.546 |   5.32       |
----------------------------------------
2007-05-04 10:16:42.100 |   5.31       |
----------------------------------------
2007-05-04 10:17:27.546 |   NA         |
----------------------------------------
2007-05-04 10:20:50.871 |   5.35       |
----------------------------------------
2007-05-04 10:21:38.652 |   5.37       |

Basically, I would like to find the MarketPrice index immediately before a time while also ommitting NA values. 基本上,我想在MarketPrice之前找到MarketPrice指数,同时也忽略NA值。 Let's say for instance we start at the time 2007-05-04 10:20:50.871 which has an index of 4 in the object. 假设我们从2007年2007-05-04 10:20:50.871 4 2007-05-04 10:20:50.871开始,该对象在对象中的索引为4。 So this means that the Market Price immediately before this time is 5.31 which has an index of 2 in the object. 因此,这意味着紧接此时间之前的市场价格为5.31 ,该对象中的索引为2。 In order to perform this task I have written up a function similar to the following: 为了执行此任务,我编写了类似于以下内容的函数:

 MPFunction <- function(t,df){

 ind <- t
 while(t>1){
     t=t-1
     if ( (index(df[t]) != index(df[ind])) && !(is.na(df[t,"MarketPrice"])))  {

    return(t)
   }
}
}

And this performs the task since the first condition in the IF statement checks to make sure the times in the index of the xts object are different and the second condition checks to make sure there is no NA value in the MarketPrice column. 由于IF语句中的第一个条件检查以确保xts对象的索引中的时间不同,并且第二个条件检查以确保MarketPrice列中没有NA值,因此这将执行MarketPrice

However, I now run into an issue when I look at several days. 但是,当我查看几天时,现在遇到了一个问题。 Let's say I now have an xts object as follows: 假设我现在有一个xts对象,如下所示:

                          | MarketPrice  |
  ----------------------------------------
  2007-05-03 16:59:58.921 |   5.32       |
  ----------------------------------------
  2007-05-04 10:12:27.546 |   NA         |
  ----------------------------------------
  2007-05-04 10:20:50.871 |   5.35       |
  ----------------------------------------

If I start at index 3 (ie at the time 2007-05-04 10:20:50.871 ) then if I wish to find the first index before this time that doesn't have an NA value in the MarketPrice column, it will go to index 1 which is 2007-05-03 16:59:58.921 . 如果我从索引3开始(即在时间2007-05-04 10:20:50.871 ),那么如果我希望在此之前找到在MarketPrice列中没有NA值的第一个索引,它将到索引1是2007-05-03 16:59:58.921 The problem however is that this is on a different day, and I want to make sure that I only extract the index of MarketPrice values on the same day. 但是,问题在于这是在另一天,并且我想确保只在同一天提取MarketPrice值的索引。

Basically, I was wondering if there is a quick modification I can make to my MPFunction above in the IF statement which will allow me to avoid finding the index of the MarketPrice from the previous day. 基本上,我想知道是否可以在IF语句中对我的MPFunction进行快速修改,从而避免从前一天开始查找MarketPrice的索引。 Also, I do not wish to split the xts object up by day, since it would complicate things quite a bit if I did. 另外,我也不希望按天划分xts对象,因为如果这样做的话,会使事情复杂化很多。

Now, I already have several idea on how to solve this (such as using the strptime function to check dates etc.) but these are all time-consuming methods, so I was hoping to find a method which is much much faster, so if anyone has any ideas I'd appreciate it. 现在,我已经对如何解决这个问题有了一些想法(例如,使用strptime函数检查日期等),但是这些都是费时的方法,因此我希望找到一种更快的方法,因此任何人都有任何想法,我将不胜感激。 Thanks in advance. 提前致谢。

Sounds like you actually want to use split.xts (why is using split a complication? It shouldn't be, even with large amounts of tick data in each day), and recombine the results: 听起来好像您实际上是想使用split.xts (为什么要使用split进行复杂处理?即使每天都有大量的滴答数据,也不应该这样),然后重新组合结果:

zz=xts(order.by = as.POSIXct(c("2007-05-03 09:59:58.921", 
                               "2007-05-03 10:03:58.921",
                               "2007-05-03 12:03:58.921"
                  "2007-05-04 10:15:33.546",
                 "2007-05-04 10:16:42.100",
                 "2007-05-04 10:17:27.546",
                 "2007-05-04 10:20:50.871",
                 "2007-05-04 10:21:38.652")),
  x = c(3, 4, 9,  5.32, 5.31, NA, 5.35, 5.37), dimnames = list(NULL, "MarketPrice"))

> zz
#                      MarketPrice
# 2007-05-03 09:59:58        3.00
# 2007-05-03 10:03:58        4.00
# 2007-05-04 10:15:33        5.32
# 2007-05-04 10:16:42        5.31
# 2007-05-04 10:17:27          NA
# 2007-05-04 10:20:50        5.35
# 2007-05-04 10:21:38        5.37


MPFunction <- function(x, time_window = "T10/T10:16:40") {
  #last(x[time_window, which.i= TRUE])   # get the index?
  # last returns the last row in the group selected:
  #last(x[time_window,])
  u <- x[time_window, which.i = TRUE]

  if (length(u) > 0) {
    # Get index which is not an NA value:
    u.na <- which(is.na(x[time_window, "MarketPrice"]))
    u2 <- u[!u %in% u.na]
    if (length(u2) > 0) {
      v <- xts(order.by = end(x[last(u2)]), x = last(u2), dimnames = list(NULL, "index.i"))        
    } else {
      v <- NULL      
    }
  } else {
    v <- NULL
  }
  v
}

# use T0/ as the start of the time window in each day for getting the index value by default. You can change this though.
chosen_window = "T0/T10:17:29"

by_day <- lapply(split(zz, f = "day"), FUN = MPFunction, time_window = chosen_window)

rr <- do.call(rbind, by_day)

> rr
#                     index.i
# 2007-05-03 10:03:58       2
# 2007-05-04 10:16:42       2

If there are no values in a day in the time_window of interest, you will get NULL for that day, and nothing returned in the output ( rr ) for that day 如果感兴趣的time_window中一天中没有任何值,则该天您将获得NULL ,而该天的输出( rr )中将不返回任何内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM