简体   繁体   English

如何使用R将不规则时间转换为XTS对象

[英]How to convert irregular times into XTS object using R

I have the following data.frame that I would like to covert into an xts() object but have been breaking my head trying to figure out how to format the times: 我想将以下data.frame转换为xts()对象,但一直想弄清楚如何格式化时间的data.frame令我data.frame

data.frame 数据框架

The data is arranged from recent (at the top) to oldest (at the bottom). 数据从最近(在顶部)到最旧(在底部)排列。 The problem is that every row is not consistent with the format so I am having trouble trying to format it in a way that each row will display the correct date & time. 问题在于每一行与格式都不一致,因此我在尝试以每行将显示正确的日期和时间的方式对其进行格式化时遇到了麻烦。

Desired output for Date/Time Column: 日期/时间列的所需输出:

01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM

DATA: 数据:

data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM", 
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

Assuming you have a typo in your last line of your desired date-time output, which I guess you mean is 12/29/16 07:00 AM , then when you have an element in the column Date that is missing a date, take the most recently known date and roll "backwards": 假设您期望的日期时间输出的最后一行有错别字,我想您的意思是12/29/16 07:00 AM ,那么当Date列中的某个元素缺少日期时,请输入最近已知的日期和日期“倒退”:

library(stringr)

l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd) 
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")

x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
#                                                                                                         News                                  Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"                                           "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)"                                              "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%"                                    "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday"                                                       "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%"                           "ETRM"

Use sub to replace a digit at the start of Date with NA followed by space followed by the digit. 使用subDate开头的数字替换为NA后跟空格,然后替换数字。 From that use read.table to create a 2 column data frame with the date (or NA ) in column 1 and the time in column 2. Fill in the NA values using na.locf giving DF2 . 从中使用read.table创建一个2列数据帧,其日期(或NA )在第1列中,时间在第2列中。使用na.locf填充NA值,得到DF2 Now cbind DF2 and data[-1] reading the data.frame so created using read.zoo . 现在, cbind DF2data[-1]绑定在一起,读取使用read.zoo创建的read.zoo Finally convert the resulting "zoo" object to "xts" . 最后,将生成的"zoo"对象转换为"xts"

DF2 <- na.locf(read.table(text = sub("^(\\d)", "NA \\1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)

Here's a solution using the tidyquant package, which loads all packages you need to solve this problem. 这是使用tidyquant软件包的解决方案,该解决方案加载了解决此问题所需的所有软件包。 Same as the other solutions, you need to have a consistent date with structure such as: 与其他解决方案一样,您需要使用以下格式的日期保持一致:

"Jan-05-17 02:55 PM"

Using the lubridate package, you can convert to POSIXct class with the mdy_hm() function as follows: 使用lubridate包,可以使用mdy_hm()函数将其转换为POSIXct类,如下所示:

"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"

Where the lubridate::mdy_hm() function stands for month-day-year hour-minute. lubridate::mdy_hm()函数代表月日日年时分。 The output is the date in the correct date-time class. 输出是正确的date-time类中的date-time

The tidyquant package has a convenient function, as_xts() , with an argument, date_col that when specified converts the data.frame date column to xts row names. tidyquant软件包具有一个方便的函数as_xts() ,带有一个参数date_col ,当指定该参数时, date_col data.frame date列转换为xts行名。 I use the pipe ( %>% ) to make the code more readable and to show the workflow, and the dplyr::mutate() function which changes the Date column to the POSIXct class using the lubridate::mdy_hm() function. 我使用管道( %>% )使代码更具可读性并显示工作流程,并使用dplyr::mutate()函数使用lubridate::mdy_hm()函数将Date列更改为POSIXct类。 The final workflow looks like this: 最终的工作流程如下所示:

data %>%
    mutate(Date = lubridate::mdy_hm(Date)) %>%
    as_xts(date_col = Date)

Make sure the Date column has all rows with a valid format such as "Jan-05-17 02:55 PM" before trying the code snippet, otherwise you will get a parsing error at the lubridate::mdy_hm() function. 在尝试代码段之前,请确保“日期”列中的所有行均具有有效格式,例如“ Jan-05-17 02:55 PM”,否则在lubridate::mdy_hm()函数中将出现解析错误。

Data I used to test is below: 我用来测试的数据如下:

data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM", 
                            "Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
                                                                                           "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
                                                                                           "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
                                                                                           "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
                                                                                           "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
                                                                                           "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
                            ), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
                                                                                                       "News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM