简体   繁体   English

R 合并 data.frames asof join

[英]R merge data.frames asof join

I have a whole bunch of data.frames with irregular time spacing.我有一大堆时间间隔不规则的data.frames。

I would like to make a new data.frame and join the others to it, for each data.frame being joined picking the latest value out of the new data.frame.我想制作一个新的data.frame并将其他数据加入其中,对于每个加入的data.frame,从新的data.frame中挑选最新的值。

For example, listOfDataFrames below contains a list of data.frames each of which has a time column in seconds.例如,下面的 listOfDataFrames 包含一个 data.frames 列表,每个数据帧都有一个以秒为单位的时间列。 I find the total range, mod the range by 60 and seqn it by to obtain an increasing seqn of full minutes.我找到了总范围,将范围修改为 60 并对其进行排序,以获得增加的整分钟的 seqn。 Now I need to merge the list of data.frames to the left of this new seqn.现在我需要将 data.frames 列表合并到这个新序列的左侧。 eg if the value in mypoints is 60, the value joined to it should be the latest value <= 60.例如,如果 mypoints 中的值为 60,则加入它的值应该是最新值 <= 60。

xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*do.call(seq,as.list(xrange%/%60))

I believe this is sometimes called an asof join.我相信这有时被称为 asof join。

Is there a simple procedure to do this?是否有一个简单的程序来做到这一点?

Thanks谢谢

EDIT: this is what I currently use编辑:这是我目前使用的

xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
result <- data.frame(Time=mypoints)
for(index in 1:length(listOfDataFrames))
{
  x<-listOfDataFrames[[index]]
  indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
  indices[indices==0] <- NA
  newdf<-data.frame(new=x$Result[indices])
  colnames(newdf)<-paste("S",index,sep="")
  result <- cbind(result,newdf)
}

EDIT: full example编辑:完整示例

AsOfJoin <- function (listOfDataFrames) {
  xrange <- range(lapply(listOfDataFrames,function(x) range(x$Time)))
  mypoints <- 60*seq(xrange[1]%/%60,1+xrange[2]%/%60)
  result <- data.frame(Time=mypoints)
  for(index in 1:length(listOfDataFrames))
  {
    x<-listOfDataFrames[[index]]
    indices <- which(sort(c(mypoints,x$Time)) %in% mypoints) - 1:length(mypoints)
    indices[indices==0] <- NA
    newdf<-data.frame(new=x$Result[indices])
    colnames(newdf)<-paste("S",index,sep="")
    result <- cbind(result,newdf)
  }
  result[is.na(result)]<-0
  result
}


a<-data.frame(Time=c(28947.5,28949.6,29000),Result=c(10,15,9))
b<-data.frame(Time=c(28947.8,28949.5),Result=c(14,19))
listOfDataFrames <- list(a,b)
result<-AsOfJoin(listOfDataFrames)

    > a
         Time Result
    1 28947.5     10
    2 28949.6     15
    3 29000.0      9
    > b
         Time Result
    1 28947.8     14
    2 28949.5     19
    > result
       Time S1 S2
    1 28920  0  0
    2 28980 15 19
    3 29040  9 19

data.table provide very fast asof joins out of the box. data.table提供非常快速的asof连接。 See also This post for an example另请参阅此帖子以获取示例

See my edit for answer.请参阅我的编辑以获取答案。 Apparently the best way.显然是最好的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM