繁体   English   中英

将字符串日期范围转换为 R 中的开始日期和停止日期

[英]Convert a string date range to separate start and stop dates in R

我有一个包含字符串日期变量的数据集(8000 个观察值)。 我想将变量拆分为格式为“%B %d %Y”的 StartDt 和 EndDt。 该变量还跨越日历年,例如 2019 年 12 月 30 日至 1 月 5 日。我没有成功尝试使用 stringr package 并进行相应操作 - 感谢任何见解!

Df<-data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018", "Dec 30 to Jan 5 2019"))

可能有点长,但这需要几个步骤,尽管它只使用矢量化函数:


library(glue)
library(stringr)
Df<-data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018", "Dec 30 to Jan 5 2019"))

## a regular expression to match abbreivated month names:
mnthrx <- paste0( "(?:", paste( month.abb, collapse="|" ), ")" )

## the big regex we will use to match it all:
rx <- glue( "({mnthrx}) (\\d+) to (?:({mnthrx}) )?(\\d+) (\\d+)" )

m <- str_match( Df$Date2, rx )

## The end date:
day2 <- as.integer(m[,5])
month2 <- m[,4]
year2 <- as.integer( m[, ncol(m)])

## The start date:
day1 <- m[,3]
month1 <- m[,2]
year1 <- year2

## if month2 is missing, its because we're in month1 still
j <- is.na(month2)
month2[j] <- month1[j]

month.number1 <- match( month1, month.abb )
month.number2 <- match( month2, month.abb )

## if month2 is smaller than month1, we swapped years:
i.next.year <- month.number2 < month.number1
year1[i.next.year] <- year2[i.next.year]-1

data.frame(
    StartDt = paste( month1,day1,year1, sep=" " ),
    EndDt = paste( month2,day2,year2, sep=" " )
)

它产生这个:


      StartDt       EndDt
1 Dec 16 2018 Dec 22 2018
2 Dec 23 2018 Dec 29 2018
3 Dec 30 2018  Jan 5 2019

str_match与正则表达式一起使用并从字符串中捕获所需的值。 模式与? 表示它们是可选的。

#extract the data in a dataframe based on pattern
dat <- as.data.frame(stringr::str_match(Df$Date2, '([A-Za-z]+)\\s(\\d+)\\sto\\s?([A-Za-z]+)?\\s(\\d+)\\s(\\d+)')[, -1])
#Change the columns to respective type
dat <- type.convert(dat, as.is = TRUE)
#Copy the year column
dat$V6 <- dat$V5
#Copy the month column if it is the same
dat$V3[is.na(dat$V3)] <- dat$V1[is.na(dat$V3)]
#Subtract 1 from the year only if the End month is earlier than Start month
dat <- transform(dat, V5 = V5 - as.integer(match(V1, month.abb) > match(V3, month.abb)))

#Create the final result dataframe pasting the values
result <- data.frame(Start = with(dat, paste(V1, V2, V5)), 
                     End   = with(dat, paste(V3, V4, V6)))
result

#        Start         End
#1 Dec 16 2018 Dec 22 2018
#2 Dec 23 2018 Dec 29 2018
#3 Dec 30 2018  Jan 5 2019
#4 Apr 15 2018 May 20 2018

数据

在输入中添加了一个额外的日期( "Apr 15 to May 20 2018" )用于测试目的。

Df <- data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018", 
                         "Dec 30 to Jan 5 2019", "Apr 15 to May 20 2018"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM