简体   繁体   English

在R中生成季度日期的序列

[英]generate seq of quarter date in R

I am new to R and am I have a data frame that looks something like this. 我是R的新手,我有一个看起来像这样的数据框。

 Date       A       B
1990 Q1     2       3
     Q2     4       2
     Q3     7       6
     Q4     5       3
1991 Q1     7       6
     Q2     1       8
     Q3     7       6
     Q4     9       2
1992 Q1     1       7
     Q2     4       6
     Q3     1       3
     Q4     5       8
...

The column stretches all the way to the end of the row and both the start date and the end date is not fixed as the data is constantly updated. 该列一直延伸到行尾,并且开始日期和结束日期都不固定,因为数据会不断更新。 I would like to format the date column into a date class and achieve something like this: 我想将日期列格式化为日期类,并实现如下所示:

 Date       A       B
1990 Q1     2       3
1990 Q2     4       2
1990 Q3     7       6
1990 Q4     5       3
1991 Q1     7       6
1991 Q2     1       8
1991 Q3     7       6
1991 Q4     9       2
1992 Q1     1       7
1992 Q2     4       6
1992 Q3     1       3
1992 Q4     5       8
...

I thought of recreating a new column of dates on the left and use the first date provided by the data (ie '1990 Q1') as the starting date and the length based on the number of rows. 我想在左侧重新创建一个新的日期列,并使用数据提供的第一个日期(即“ 1990 Q1”)作为开始日期,并根据行数确定长度。 Was looking at using seq. 正在考虑使用seq。 and as.yearqtr commands but can't seem to work out a proper code for it. 和as.yearqtr命令,但似乎无法为其制定合适的代码。 Anyone knows of a better way to do this? 有人知道更好的方法吗?

Assuming Date is a single character column, here's an option using tidyr : 假设Date是单个字符列,这是使用tidyr的选项:

library(tidyr)

# separate date into year and quarter, inserting NAs in year as necessary
df %>% separate(Date, into = c('year', 'quarter'), fill = 'left') %>% 
    # fill NAs with previous value
    fill(year) %>% 
    # join year and quarter back into a single column
    unite(Date, year, quarter, sep = ' ')

#       Date A B
# 1  1990 Q1 2 3
# 2  1990 Q2 4 2
# 3  1990 Q3 7 6
# 4  1990 Q4 5 3
# 5  1991 Q1 7 6
# 6  1991 Q2 1 8
# 7  1991 Q3 7 6
# 8  1991 Q4 9 2
# 9  1992 Q1 1 7
# 10 1992 Q2 4 6
# 11 1992 Q3 1 3
# 12 1992 Q4 5 8

Data 数据

df <- structure(list(Date = structure(c(1L, 4L, 5L, 6L, 2L, 4L, 5L, 
        6L, 3L, 4L, 5L, 6L), .Label = c("1990 Q1", "1991 Q1", "1992 Q1", 
        "Q2", "Q3", "Q4"), class = "factor"), A = c(2L, 4L, 7L, 5L, 7L, 
        1L, 7L, 9L, 1L, 4L, 1L, 5L), B = c(3L, 2L, 6L, 3L, 6L, 8L, 6L, 
        2L, 7L, 6L, 3L, 8L)), .Names = c("Date", "A", "B"), class = "data.frame", row.names = c(NA, 
        -12L))

Here is a straight forward way to create the sequence which you are looking for: 这是创建所需序列的直接方法:

numrows<-10  #number of elements desired

#create the sequence of Date objects
qtrseq<-seq(as.Date("1990-01-01"), by="quarter", length.out = numrows)

#created vector for the formatted display
qtrformatted<-paste(as.POSIXlt(myseq)$year+1900, quarters(myseq))

The downside of this method and the other listed solutions is the lost of the Date object. 该方法和其他列出的解决方案的缺点是丢失了Date对象。 There is no good way in base R to format the Q1, Q2... and have the object remain a Date object. 在基数R中没有格式化Q1,Q2 ...并将对象保留为Date对象的好方法。 Depending on your application it might be best to store the date sequence in the data frame and use the statement for qtr formatted only output purposes. 根据您的应用程序,最好将日期序列存储在数据框中,并将该语句用于qtr格式化的仅用于输出目的。 Best of luck. 祝你好运。

We could do this in base R . 我们可以在base R执行此操作。 Create a grouping variable using grep and cumsum , extract the numeric substring from 'Date', replace the '' values with the year values using ave , and then paste it with the quarter substring extracted using sub . 使用grepcumsum创建分组变量,从'Date'中提取数字子字符串,使用ave''值替换为年份值,然后将其paste到使用sub提取的四分之一子字符串中。

df$Date <-  paste(ave(sub("\\s*Q.", "", df$Date),
     cumsum(grepl("^\\d+", df$Date)), FUN = function(x) x[nzchar(x)]),
   sub("^\\d+\\s+", "", df$Date))
df$Date
#[1] "1990 Q1" "1990 Q2" "1990 Q3" "1990 Q4" "1991 Q1" "1991 Q2" 
#[7] "1991 Q3" "1991 Q4" "1992 Q1" "1992 Q2" "1992 Q3" "1992 Q4"

NO Addtional packages needed. 无需其他软件包。


If we need a package solution, data.table can be used 如果需要打包解决方案,可以使用data.table

library(data.table)
library(stringr)
setDT(df)[, Date:=sub("^(Q.*)", paste0(word(Date[1],1), " \\1") , Date), 
                                                 cumsum(grepl("^\\d+" , Date))]
df
#       Date A B
# 1: 1990 Q1 2 3
# 2: 1990 Q2 4 2
# 3: 1990 Q3 7 6
# 4: 1990 Q4 5 3
# 5: 1991 Q1 7 6
# 6: 1991 Q2 1 8
# 7: 1991 Q3 7 6
# 8: 1991 Q4 9 2
# 9: 1992 Q1 1 7
#10: 1992 Q2 4 6
#11: 1992 Q3 1 3
#12: 1992 Q4 5 8

data 数据

df <- structure(list(Date = c("1990 Q1", "Q2", "Q3", "Q4", "1991 Q1", 
"Q2", "Q3", "Q4", "1992 Q1", "Q2", "Q3", "Q4"), A = c(2L, 4L, 
7L, 5L, 7L, 1L, 7L, 9L, 1L, 4L, 1L, 5L), B = c(3L, 2L, 6L, 3L, 
6L, 8L, 6L, 2L, 7L, 6L, 3L, 8L)), .Names = c("Date", "A", "B"
), row.names = c(NA, -12L), class = "data.frame")

To use the yearqtr function from the zoo package to create a year-quarter time series, you can first split the df$Date values into year and quarter strings, use na.locf , also from the zoo package, to fill in missing values of year with the value from the previous row, and then transform to a zoo time series with year quarter dates. 要使用zoo包中的yearqtr函数来创建年份-季度时间序列,您可以先将df$Date值拆分为年和季度字符串,也使用zoo包中的na.locf来填充的缺失值。上一行中的值作为年份,然后转换为具有年份季度日期的zoo时间序列。 Code would look like 代码看起来像

   library(zoo)
#
# split Date into year and quarter strings
#
  tmp <- t(sapply(strsplit((df$Date)," "), function(x) if(length(x)==1) c(NA, x) else x) ) 
#
# use na.locf to replace NA with previous year
#
  tmp <- paste(na.locf(tmp[,1]), tmp[,2])
#
#   transform df into a zoo time series object with yearqtr dates
#
 df_zoo <- zoo(df[,-1], order.by = as.yearqtr(tmp))

Here is something you can try 这是你可以尝试的

library(dplyr); library(stringr); library(zoo)
df %>% mutate(Date = paste(na.locf(str_extract(Date, "^[0-9]{4}")),     
                                   str_extract(Date, "Q[1-4]$"), sep = " "))
      Date A B
1  1990 Q1 2 3
2  1990 Q2 4 2
3  1990 Q3 7 6
4  1990 Q4 5 3
5  1991 Q1 7 6
6  1991 Q2 1 8
7  1991 Q3 7 6
8  1991 Q4 9 2
9  1992 Q1 1 7
10 1992 Q2 4 6
11 1992 Q3 1 3
12 1992 Q4 5 8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM