I am a relatively new R user and this is my first question on StackOverflow, so apologies if my question is unclear or obviously stated somewhere else.
I have a large dataset (7.8 GB, 137 million observations) that I have loaded into R in a ffdf format as my understanding is that this will allow me to manipulate the data (with the aim of reducing it to a manageable size) without crashing my computer.
My dataset consists of six features, one of which is a timestamp in the format "2012-10-12 00:30:00 BST". As each observation (electricity readings) is taken at exactly every half hour interval, I would like to categorise the data by which of the 48 half hours in the day the observation takes place. As a first step I would therefore like to separate out the date and the time from the timestamp. (The aim after that is to code this time column from 1-48 for each half hour.)
The following code worked to create a new date column:
ff1$date <- as.character(as.Date(ff1$DateTime))
However, I am struggling to do the same for time and have tried a number of methods based on perhaps crude copying from other examples.
(1) ff1$time <- as.POSIXct(strptime(as.character(ff1$DateTime),"%T"))
(2) ff1$time <- strptime(ff1$DateTime,"%Y-%m-%d %H:%M:%S")
(3) ff1$time <- sapply(strptime(as.character(ff1$DateTime)," "), "[", 2)
None of these work. The errors for each of the three lines above are:
(1) Error in strptime(as.character(ff1$DateTime), "%T") : invalid 'x' argument
(2) Error in strptime(ff1$DateTime, "%Y-%m-%d %H:%M:%S") : invalid 'x' argument
(3) Error in strptime(as.character(ff1$DateTime), " ") : invalid 'x' argument
Is this because the data is in fdff format? Are there other ways of doing this?
Many thanks in advance!
Arjun
dput:
structure(list(LCLid = structure(c(1L, 1L, 1L, 1L), .Label = "MAC000002", class = "factor"),
stdorToU = structure(c(1L, 1L, 1L, 1L), .Label = "Std", class = "factor"),
DateTime = structure(c(1349998200, 1.35e+09, 1350001800,
1350003600), tzone = "", class = c("POSIXct", "POSIXt")),
KWH.hh..per.half.hour. = structure(c(1L, 1L, 1L, 1L), .Label = " 0 ", class = "factor"),
Acorn = structure(c(1L, 1L, 1L, 1L), .Label = "ACORN-A", class = "factor"),
Acorn_grouped = structure(c(1L, 1L, 1L, 1L), .Label = "Affluent", class = "factor"),
date = structure(c(1L, 2L, 2L, 2L), .Label = c("2012-10-11",
"2012-10-12"), class = "factor")), row.names = c("1", "2",
"3", "4"), class = "data.frame")
LCLid DateTime
1 MAC000002 2012-10-12 00:30:00
2 MAC000002 2012-10-12 01:00:00
3 MAC000002 2012-10-12 01:30:00
4 MAC000002 2012-10-12 02:00:00
5 MAC000002 2012-10-12 02:30:00
6 MAC000002 2012-10-12 03:00:00
The code you are trying is giving errors probably because the column "DateTime
is not of class "POSIXt"
, "POSIXct"
. So first coerce to a date/time class, then extract the time only.
ff1$DateTime <- as.POSIXct(ff1$DateTime)
format(ff1$DateTime, format = "%T")
#[1] "00:30:00"
Edit.
If the above gives an error try
ff1$DateTime <- as.POSIXct(as.character(ff1$DateTime))
format(ff1$DateTime, format = "%T")
Data.
ff1 <- data.frame(DateTime = "2012-10-12 00:30:00 BST")
If you use dates and times a lot, lubridate
may become helpful. Here I use ymd_hms()
to convert the y ear- m onth- d ay h our- m inute- s econd format into an actual datetime. Then use format.
This is not materially different than the other solutions, just a different way of converting back to a datetime.
library(lubridate)
ff1$time <- format(ymd_hms(ff1$DateTime), format = "%H:%M:%S")
> ff1
LCLid stdorToU DateTime KWH.hh..per.half.hour. Acorn Acorn_grouped date time
1 MAC000002 Std 2012-10-11 19:30:00 0 ACORN-A Affluent 2012-10-11 19:30:00
2 MAC000002 Std 2012-10-11 20:00:00 0 ACORN-A Affluent 2012-10-12 20:00:00
3 MAC000002 Std 2012-10-11 20:30:00 0 ACORN-A Affluent 2012-10-12 20:30:00
4 MAC000002 Std 2012-10-11 21:00:00 0 ACORN-A Affluent 2012-10-12 21:00:00
You could use strsplit
.
sapply(strsplit(as.character(dat$x), " "), `[`, 1)
# [1] "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12" "2012-10-12"
sapply(strsplit(as.character(dat$x), " "), `[`, 2)
# [1] "00:30:00" "00:30:00" "00:30:00" "00:30:00" "00:30:00"
Data:
x <- "2012-10-12 00:30:00 BST"
dat <- data.frame(x=replicate(5, x))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.