简体   繁体   中英

Merge datasets in Python or R using datetime variable

I have two different datasets that need merging.

  • First dataset has the data per minute
  • Second data has the information per hour.

I would like to aggregate all the data in the first dataset to hours from minutes (01/12/2020 00:00, 01/12/2020 00:01, 01/12/2020 00:02....01/12/2020 00:59) to 01/12/2020 to 00:00 .

How can I achieve this?

Since you are coming from R, if you want to use Python for tabular data have a look at pandas which offers a comprehensive set of tools for processing such datasets.

There, I think you are looking for pandas.Series.dt.floor which allows you to perform a floor operation to the intended time unit, here hours:

series = pd.Series(
    ['2020-01-01 12:01:00', '2020-01-01 12:02:00', '2020-01-01 12:30:00', '2020-01-01 12:59:00'],
    name='timestamp',
    dtype="datetime64[ns]"
)
series.dt.floor('H')

This will return

0   2020-01-01 12:00:00
1   2020-01-01 12:00:00
2   2020-01-01 12:00:00
3   2020-01-01 12:00:00

Adding to @nehalem's Python solution, here is one using R. The lubridate package is wonderful and makes working with dates and times easy.

First, convert your series of date-time characters into a date-time object.

library(lubridate)
series <- c("01/12/2020 00:00", "01/12/2020 00:01", "01/12/2020 00:02", "01/12/2020 00:59", "01/12/2020 04:20")
series2 <- parse_date_time(series, "d m y H M") # specify current format of your data

This will give:

> series2
[1] "2020-12-01 00:00:00 UTC" "2020-12-01 00:01:00 UTC" "2020-12-01 00:02:00 UTC"
[4] "2020-12-01 00:59:00 UTC" "2020-12-01 04:20:00 UTC"

Finally, round the minutes to hours:

> series3 <- floor_date(series2, "hour")
> series3
[1] "2020-12-01 00:00:00 UTC" "2020-12-01 00:00:00 UTC" "2020-12-01 00:00:00 UTC"
[4] "2020-12-01 00:00:00 UTC" "2020-12-01 04:00:00 UTC"

Additionally, the documentation contains information about options to change, among other things, the time zone and time format depending on your requirement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM