简体   繁体   中英

Separating time and date values from a timestamp/ DateTime column in an ffdf

I am a relatively new R user and this is my first question on StackOverflow, so apologies if my question is unclear or obviously stated somewhere else.

I have a large dataset (7.8 GB, 137 million observations) that I have loaded into R in a ffdf format as my understanding is that this will allow me to manipulate the data (with the aim of reducing it to a manageable size) without crashing my computer.

My dataset consists of six features, one of which is a timestamp in the format "2012-10-12 00:30:00 BST". As each observation (electricity readings) is taken at exactly every half hour interval, I would like to categorise the data by which of the 48 half hours in the day the observation takes place. As a first step I would therefore like to separate out the date and the time from the timestamp. (The aim after that is to code this time column from 1-48 for each half hour.)

The following code worked to create a new date column:

ff1$date <- as.character(as.Date(ff1$DateTime))

However, I am struggling to do the same for time and have tried a number of methods based on perhaps crude copying from other examples.

(1) ff1$time <- as.POSIXct(strptime(as.character(ff1$DateTime),"%T"))

(2) ff1$time <- strptime(ff1$DateTime,"%Y-%m-%d %H:%M:%S")

(3) ff1$time <- sapply(strptime(as.character(ff1$DateTime)," "), "[", 2)

None of these work. The errors for each of the three lines above are:

(1) Error in strptime(as.character(ff1$DateTime), "%T") : invalid 'x' argument

(2) Error in strptime(ff1$DateTime, "%Y-%m-%d %H:%M:%S") : invalid 'x' argument

(3) Error in strptime(as.character(ff1$DateTime), " ") : invalid 'x' argument

Is this because the data is in fdff format? Are there other ways of doing this?

Many thanks in advance!

Arjun

dput:

structure(list(LCLid = structure(c(1L, 1L, 1L, 1L), .Label = "MAC000002", class = "factor"), 
    stdorToU = structure(c(1L, 1L, 1L, 1L), .Label = "Std", class = "factor"), 
    DateTime = structure(c(1349998200, 1.35e+09, 1350001800, 
    1350003600), tzone = "", class = c("POSIXct", "POSIXt")), 
    KWH.hh..per.half.hour. = structure(c(1L, 1L, 1L, 1L), .Label = " 0 ", class = "factor"), 
    Acorn = structure(c(1L, 1L, 1L, 1L), .Label = "ACORN-A", class = "factor"), 
    Acorn_grouped = structure(c(1L, 1L, 1L, 1L), .Label = "Affluent", class = "factor"), 
    date = structure(c(1L, 2L, 2L, 2L), .Label = c("2012-10-11", 
    "2012-10-12"), class = "factor")), row.names = c("1", "2", 
"3", "4"), class = "data.frame")

headers of relevant columns:

      LCLid            DateTime
1 MAC000002 2012-10-12 00:30:00
2 MAC000002 2012-10-12 01:00:00
3 MAC000002 2012-10-12 01:30:00
4 MAC000002 2012-10-12 02:00:00
5 MAC000002 2012-10-12 02:30:00
6 MAC000002 2012-10-12 03:00:00

The code you are trying is giving errors probably because the column "DateTime is not of class "POSIXt" , "POSIXct" . So first coerce to a date/time class, then extract the time only.

ff1$DateTime <- as.POSIXct(ff1$DateTime)
format(ff1$DateTime, format = "%T")
#[1] "00:30:00"

Edit.

If the above gives an error try

ff1$DateTime <- as.POSIXct(as.character(ff1$DateTime))
format(ff1$DateTime, format = "%T")

Data.

ff1 <- data.frame(DateTime = "2012-10-12 00:30:00 BST")

If you use dates and times a lot, lubridate may become helpful. Here I use ymd_hms() to convert the y ear- m onth- d ay h our- m inute- s econd format into an actual datetime. Then use format.

This is not materially different than the other solutions, just a different way of converting back to a datetime.

Code:

library(lubridate)

ff1$time <- format(ymd_hms(ff1$DateTime), format = "%H:%M:%S")

Result:

> ff1
      LCLid stdorToU            DateTime KWH.hh..per.half.hour.   Acorn Acorn_grouped       date     time
1 MAC000002      Std 2012-10-11 19:30:00                     0  ACORN-A      Affluent 2012-10-11 19:30:00
2 MAC000002      Std 2012-10-11 20:00:00                     0  ACORN-A      Affluent 2012-10-12 20:00:00
3 MAC000002      Std 2012-10-11 20:30:00                     0  ACORN-A      Affluent 2012-10-12 20:30:00
4 MAC000002      Std 2012-10-11 21:00:00                     0  ACORN-A      Affluent 2012-10-12 21:00:00

You could use strsplit .

sapply(strsplit(as.character(dat$x), " "), `[`, 1)
# [1] "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12"
sapply(strsplit(as.character(dat$x), " "), `[`, 2)
# [1] "00:30:00" "00:30:00" "00:30:00" "00:30:00" "00:30:00"

Data:

x <- "2012-10-12 00:30:00 BST"
dat <- data.frame(x=replicate(5, x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM