简体   繁体   中英

Is it possible to merge two .csv files to the nearest hour?

I have two .csv files called 'tag.csv' and 'wind.csv' that I would like to merge based on the 'Date' column. However, as you can see from the data below the times do not match exactly.

tag.csv

Date
13/12/2014 05:11
13/12/2014 05:43
13/12/2014 06:34

wind.csv

Date
13/12/2014 05:00
13/12/2014 06:00
13/12/2014 07:00

I am using the following simple script.

tag<- read.csv("tag.csv")
wind<- read.csv("wind.csv")
myfulldata = merge(tag, wind)

Is there a command I could add so that the data will merge with the closest time? For example, 13/12/2014 05:11 will merge with 13/12/2014 05:00

data.table package has its own merging method and an option called roll that allows closest matches. eg

library(data.table)
dt1 <- data.table(a = c(1,2,3,4), b=1:4)
dt2 <- data.table(a = c(1.6,2.1,3.2,4.9), c=1:4)
setkeyv(dt1, "a")
setkeyv(dt2, "a")
dt1[dt2, roll="nearest"]

See Join R data.tables where key values are not exactly equal--combine rows with closest times for another example.

To be robust though, I would probably instead do some datetime manipulation to force them to both be to the nearest hour, and then do a nearest join if all else fails

edit: this method won't work on data.frame objects, if you haven't used data.table before maybe stay simple and round the hours

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM