簡體   English   中英

如何使用R將不規則時間轉換為XTS對象

[英]How to convert irregular times into XTS object using R

我想將以下data.frame轉換為xts()對象,但一直想弄清楚如何格式化時間的data.frame令我data.frame

數據框架

數據從最近(在頂部)到最舊(在底部)排列。 問題在於每一行與格式都不一致,因此我在嘗試以每行將顯示正確的日期和時間的方式對其進行格式化時遇到了麻煩。

日期/時間列的所需輸出:

01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM

數據:

data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM", 
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

假設您期望的日期時間輸出的最后一行有錯別字,我想您的意思是12/29/16 07:00 AM ,那么當Date列中的某個元素缺少日期時,請輸入最近已知的日期和日期“倒退”:

library(stringr)

l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd) 
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")

x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
#                                                                                                         News                                  Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"                                           "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)"                                              "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%"                                    "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday"                                                       "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%"                           "ETRM"

使用subDate開頭的數字替換為NA后跟空格,然后替換數字。 從中使用read.table創建一個2列數據幀,其日期(或NA )在第1列中,時間在第2列中。使用na.locf填充NA值,得到DF2 現在, cbind DF2data[-1]綁定在一起,讀取使用read.zoo創建的read.zoo 最后,將生成的"zoo"對象轉換為"xts"

DF2 <- na.locf(read.table(text = sub("^(\\d)", "NA \\1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)

這是使用tidyquant軟件包的解決方案,該解決方案加載了解決此問題所需的所有軟件包。 與其他解決方案一樣,您需要使用以下格式的日期保持一致:

"Jan-05-17 02:55 PM"

使用lubridate包,可以使用mdy_hm()函數將其轉換為POSIXct類,如下所示:

"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"

lubridate::mdy_hm()函數代表月日日年時分。 輸出是正確的date-time類中的date-time

tidyquant軟件包具有一個方便的函數as_xts() ,帶有一個參數date_col ,當指定該參數時, date_col data.frame date列轉換為xts行名。 我使用管道( %>% )使代碼更具可讀性並顯示工作流程,並使用dplyr::mutate()函數使用lubridate::mdy_hm()函數將Date列更改為POSIXct類。 最終的工作流程如下所示:

data %>%
    mutate(Date = lubridate::mdy_hm(Date)) %>%
    as_xts(date_col = Date)

在嘗試代碼段之前,請確保“日期”列中的所有行均具有有效格式,例如“ Jan-05-17 02:55 PM”,否則在lubridate::mdy_hm()函數中將出現解析錯誤。

我用來測試的數據如下:

data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM", 
                            "Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
                                                                                           "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
                                                                                           "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
                                                                                           "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
                                                                                           "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
                                                                                           "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
                            ), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
                                                                                                       "News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM