简体   繁体   English

从 R 中的年、月和周数据插入年-月-日

[英]Interpolate year-month-day from year, month, and week data in R

I've inherited a dataset with measurements spanning 1970–2019.我继承了一个测量范围跨越 1970-2019 的数据集。 The head and tail look something like this:头部和尾部看起来像这样:

year  month  week    X1 
1970      1     1   0.21
1970      1     2   0.22
1970      1     3   0.34
1970      1     4   0.34
1970      2     5   0.35
1970      2     6   0.25
... 
2019     11    47   0.063
2019     12    48   0.062
2019     12    49   0.068
2019     12    50   0.067
2019     12    51   0.074
2019     12    52   0.075

Each observation of X1 was recorded on the first day of each week (ie, Monday).在每周的第一天(即星期一)记录 X1 的每次观察。 I'd like to create a date column in ISO 8601 format (yyyy-mm-dd).我想以 ISO 8601 格式 (yyyy-mm-dd) 创建一个日期列。 Given year, month, and week, it should be possible to extract which day of the month the Monday of each week is.给定年、月和周,应该可以提取每周的星期一是一个月的哪一天。 Note: measurements were taken every Monday, regardless of holidays.注:每周一测量,不考虑节假日。

You can use base R:您可以使用基础 R:

df <- data.frame(
  year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
  month = c(1,1,1,1,2,2,11,12,12,12),
  week = c(1,2,3,4,5,6,47,48,49,50)
)

df$date_string <- paste(df$year,df$week,1, sep = "-")
df$date <- as.Date(x = df$date_string,format = "%Y-%U-%u")

You can have a look at: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/strptime你可以看看: https : //www.rdocumentation.org/packages/base/versions/3.6.2/topics/strptime

'%U' converts the week of the year and the '1' is needed for the first day of the week. '%U' 转换一年中的一周,一周的第一天需要 '1'。

This is really just a one-liner.这真的只是一个单线。 You can generate a vector of every Monday since 5th January 1970 using the lubridate package like this:您可以使用lubridate包生成自 1970 年 1 月 5 日以来每个星期一的向量,如下所示:

as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7)

That takes you up to today.这将带你到今天。

Here's a reprex showing the first 100 Mondays since the start of 1970:这是一个表示自 1970 年初以来的前 100 个星期一的正则表达式:

head(as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7), 100)
#>   [1] "1970-01-05 BST" "1970-01-12 BST" "1970-01-19 BST" "1970-01-26 BST"
#>   [5] "1970-02-02 BST" "1970-02-09 BST" "1970-02-16 BST" "1970-02-23 BST"
#>   [9] "1970-03-02 BST" "1970-03-09 BST" "1970-03-16 BST" "1970-03-23 BST"
#>  [13] "1970-03-30 BST" "1970-04-06 BST" "1970-04-13 BST" "1970-04-20 BST"
#>  [17] "1970-04-27 BST" "1970-05-04 BST" "1970-05-11 BST" "1970-05-18 BST"
#>  [21] "1970-05-25 BST" "1970-06-01 BST" "1970-06-08 BST" "1970-06-15 BST"
#>  [25] "1970-06-22 BST" "1970-06-29 BST" "1970-07-06 BST" "1970-07-13 BST"
#>  [29] "1970-07-20 BST" "1970-07-27 BST" "1970-08-03 BST" "1970-08-10 BST"
#>  [33] "1970-08-17 BST" "1970-08-24 BST" "1970-08-31 BST" "1970-09-07 BST"
#>  [37] "1970-09-14 BST" "1970-09-21 BST" "1970-09-28 BST" "1970-10-05 BST"
#>  [41] "1970-10-12 BST" "1970-10-19 BST" "1970-10-26 BST" "1970-11-02 BST"
#>  [45] "1970-11-09 BST" "1970-11-16 BST" "1970-11-23 BST" "1970-11-30 BST"
#>  [49] "1970-12-07 BST" "1970-12-14 BST" "1970-12-21 BST" "1970-12-28 BST"
#>  [53] "1971-01-04 BST" "1971-01-11 BST" "1971-01-18 BST" "1971-01-25 BST"
#>  [57] "1971-02-01 BST" "1971-02-08 BST" "1971-02-15 BST" "1971-02-22 BST"
#>  [61] "1971-03-01 BST" "1971-03-08 BST" "1971-03-15 BST" "1971-03-22 BST"
#>  [65] "1971-03-29 BST" "1971-04-05 BST" "1971-04-12 BST" "1971-04-19 BST"
#>  [69] "1971-04-26 BST" "1971-05-03 BST" "1971-05-10 BST" "1971-05-17 BST"
#>  [73] "1971-05-24 BST" "1971-05-31 BST" "1971-06-07 BST" "1971-06-14 BST"
#>  [77] "1971-06-21 BST" "1971-06-28 BST" "1971-07-05 BST" "1971-07-12 BST"
#>  [81] "1971-07-19 BST" "1971-07-26 BST" "1971-08-02 BST" "1971-08-09 BST"
#>  [85] "1971-08-16 BST" "1971-08-23 BST" "1971-08-30 BST" "1971-09-06 BST"
#>  [89] "1971-09-13 BST" "1971-09-20 BST" "1971-09-27 BST" "1971-10-04 BST"
#>  [93] "1971-10-11 BST" "1971-10-18 BST" "1971-10-25 BST" "1971-11-01 GMT"
#>  [97] "1971-11-08 GMT" "1971-11-15 GMT" "1971-11-22 GMT" "1971-11-29 GMT"

Created on 2020-02-24 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 2 月 24 日创建

using lubridate package, you can calculate as following:使用lubridate包,您可以计算如下:

df <- data.frame(
  year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
  month = c(1,1,1,1,2,2,11,12,12,12),
  week = c(1,2,3,4,5,6,47,48,49,50)
)

df$year_first_day <- lubridate::ymd(paste(df$year, '0101', sep = ''))
df$year_first_monday <- lubridate::ceiling_date(df$year_first_day, unit = 'weeks', week_start = 1)
df$date <- lubridate::dweeks(df$week - 1) + df$year_first_monday
df
#    year month week year_first_monday year_first_day       date
# 1  1970     1    1        1970-01-05     1970-01-01 1970-01-05
# 2  1970     1    2        1970-01-05     1970-01-01 1970-01-12
# 3  1970     1    3        1970-01-05     1970-01-01 1970-01-19
# 4  1970     1    4        1970-01-05     1970-01-01 1970-01-26
# 5  1970     2    5        1970-01-05     1970-01-01 1970-02-02
# 6  1970     2    6        1970-01-05     1970-01-01 1970-02-09
# 7  2019    11   47        2019-01-07     2019-01-01 2019-11-25
# 8  2019    12   48        2019-01-07     2019-01-01 2019-12-02
# 9  2019    12   49        2019-01-07     2019-01-01 2019-12-09
# 10 2019    12   50        2019-01-07     2019-01-01 2019-12-16

Here is one idea.这是一个想法。 Notice that I only used the first six rows from your example for this demonstration.请注意,在本演示中,我仅使用了示例中的前六行。

library(dplyr)
library(lubridate)

date_seq <- tibble(
  # Create a data frame with dates from 1970 to 2019
  date = seq.Date(as.Date("1970-01-01"), as.Date("2019-12-31"), by = 1)
) %>%
  # Create weekday
  mutate(weekday = weekdays(date)) %>%
  # Filter for Monday
  filter(weekday %in% "Monday") %>%
  # Create year, month
  mutate(year = year(date), month = month(date)) %>%
  # Create week number
  mutate(week = 1:n()) %>%
  # Join the data
  left_join(dat, by = c("year", "month", "week"))
date_seq
# # A tibble: 2,609 x 6
#    date       weekday  year month  week    X1
#    <date>     <chr>   <dbl> <dbl> <int> <dbl>
#  1 1970-01-05 Monday   1970     1     1  0.21
#  2 1970-01-12 Monday   1970     1     2  0.22
#  3 1970-01-19 Monday   1970     1     3  0.34
#  4 1970-01-26 Monday   1970     1     4  0.34
#  5 1970-02-02 Monday   1970     2     5  0.35
#  6 1970-02-09 Monday   1970     2     6  0.25
#  7 1970-02-16 Monday   1970     2     7 NA   
#  8 1970-02-23 Monday   1970     2     8 NA   
#  9 1970-03-02 Monday   1970     3     9 NA   
# 10 1970-03-09 Monday   1970     3    10 NA   
# # ... with 2,599 more rows

DATA数据

dat <- read.table(text = "year  month  week    X1 
1970      1     1   0.21
1970      1     2   0.22
1970      1     3   0.34
1970      1     4   0.34
1970      2     5   0.35
1970      2     6   0.25",
                header = TRUE, stringsAsFactors = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM