[英]How to convert irregular times into XTS object using R
I have the following data.frame
that I would like to covert into an xts()
object but have been breaking my head trying to figure out how to format the times: 我想将以下
data.frame
转换为xts()
对象,但一直想弄清楚如何格式化时间的data.frame
令我data.frame
:
The data is arranged from recent (at the top) to oldest (at the bottom). 数据从最近(在顶部)到最旧(在底部)排列。 The problem is that every row is not consistent with the format so I am having trouble trying to format it in a way that each row will display the correct date & time.
问题在于每一行与格式都不一致,因此我在尝试以每行将显示正确的日期和时间的方式对其进行格式化时遇到了麻烦。
Desired output for Date/Time Column: 日期/时间列的所需输出:
01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM
DATA: 数据:
data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM",
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
Assuming you have a typo in your last line of your desired date-time output, which I guess you mean is 12/29/16 07:00 AM
, then when you have an element in the column Date
that is missing a date, take the most recently known date and roll "backwards": 假设您期望的日期时间输出的最后一行有错别字,我想您的意思是
12/29/16 07:00 AM
,那么当Date
列中的某个元素缺少日期时,请输入最近已知的日期和日期“倒退”:
library(stringr)
l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd)
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")
x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
# News Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire" "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)" "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%" "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday" "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%" "ETRM"
Use sub
to replace a digit at the start of Date
with NA
followed by space followed by the digit. 使用
sub
将Date
开头的数字替换为NA
后跟空格,然后替换数字。 From that use read.table
to create a 2 column data frame with the date (or NA
) in column 1 and the time in column 2. Fill in the NA
values using na.locf
giving DF2
. 从中使用
read.table
创建一个2列数据帧,其日期(或NA
)在第1列中,时间在第2列中。使用na.locf
填充NA
值,得到DF2
。 Now cbind
DF2
and data[-1]
reading the data.frame so created using read.zoo
. 现在,
cbind
DF2
和data[-1]
绑定在一起,读取使用read.zoo
创建的read.zoo
。 Finally convert the resulting "zoo"
object to "xts"
. 最后,将生成的
"zoo"
对象转换为"xts"
。
DF2 <- na.locf(read.table(text = sub("^(\\d)", "NA \\1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)
Here's a solution using the tidyquant
package, which loads all packages you need to solve this problem. 这是使用
tidyquant
软件包的解决方案,该解决方案加载了解决此问题所需的所有软件包。 Same as the other solutions, you need to have a consistent date with structure such as: 与其他解决方案一样,您需要使用以下格式的日期保持一致:
"Jan-05-17 02:55 PM"
Using the lubridate
package, you can convert to POSIXct
class with the mdy_hm()
function as follows: 使用
lubridate
包,可以使用mdy_hm()
函数将其转换为POSIXct
类,如下所示:
"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"
Where the lubridate::mdy_hm()
function stands for month-day-year hour-minute. lubridate::mdy_hm()
函数代表月日日年时分。 The output is the date in the correct date-time
class. 输出是正确的
date-time
类中的date-time
。
The tidyquant
package has a convenient function, as_xts()
, with an argument, date_col
that when specified converts the data.frame date column to xts row names. tidyquant
软件包具有一个方便的函数as_xts()
,带有一个参数date_col
,当指定该参数时, date_col
data.frame date列转换为xts行名。 I use the pipe ( %>%
) to make the code more readable and to show the workflow, and the dplyr::mutate()
function which changes the Date
column to the POSIXct
class using the lubridate::mdy_hm()
function. 我使用管道(
%>%
)使代码更具可读性并显示工作流程,并使用dplyr::mutate()
函数使用lubridate::mdy_hm()
函数将Date
列更改为POSIXct
类。 The final workflow looks like this: 最终的工作流程如下所示:
data %>%
mutate(Date = lubridate::mdy_hm(Date)) %>%
as_xts(date_col = Date)
Make sure the Date column has all rows with a valid format such as "Jan-05-17 02:55 PM" before trying the code snippet, otherwise you will get a parsing error at the lubridate::mdy_hm()
function. 在尝试代码段之前,请确保“日期”列中的所有行均具有有效格式,例如“ Jan-05-17 02:55 PM”,否则在
lubridate::mdy_hm()
函数中将出现解析错误。
Data I used to test is below: 我用来测试的数据如下:
data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM",
"Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.