簡體   English   中英

從 R 中的年、月和周數據插入年-月-日

[英]Interpolate year-month-day from year, month, and week data in R

我繼承了一個測量范圍跨越 1970-2019 的數據集。 頭部和尾部看起來像這樣:

year  month  week    X1 
1970      1     1   0.21
1970      1     2   0.22
1970      1     3   0.34
1970      1     4   0.34
1970      2     5   0.35
1970      2     6   0.25
... 
2019     11    47   0.063
2019     12    48   0.062
2019     12    49   0.068
2019     12    50   0.067
2019     12    51   0.074
2019     12    52   0.075

在每周的第一天(即星期一)記錄 X1 的每次觀察。 我想以 ISO 8601 格式 (yyyy-mm-dd) 創建一個日期列。 給定年、月和周,應該可以提取每周的星期一是一個月的哪一天。 注:每周一測量,不考慮節假日。

您可以使用基礎 R:

df <- data.frame(
  year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
  month = c(1,1,1,1,2,2,11,12,12,12),
  week = c(1,2,3,4,5,6,47,48,49,50)
)

df$date_string <- paste(df$year,df$week,1, sep = "-")
df$date <- as.Date(x = df$date_string,format = "%Y-%U-%u")

你可以看看: https : //www.rdocumentation.org/packages/base/versions/3.6.2/topics/strptime

'%U' 轉換一年中的一周,一周的第一天需要 '1'。

這真的只是一個單線。 您可以使用lubridate包生成自 1970 年 1 月 5 日以來每個星期一的向量,如下所示:

as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7)

這將帶你到今天。

這是一個表示自 1970 年初以來的前 100 個星期一的正則表達式:

head(as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7), 100)
#>   [1] "1970-01-05 BST" "1970-01-12 BST" "1970-01-19 BST" "1970-01-26 BST"
#>   [5] "1970-02-02 BST" "1970-02-09 BST" "1970-02-16 BST" "1970-02-23 BST"
#>   [9] "1970-03-02 BST" "1970-03-09 BST" "1970-03-16 BST" "1970-03-23 BST"
#>  [13] "1970-03-30 BST" "1970-04-06 BST" "1970-04-13 BST" "1970-04-20 BST"
#>  [17] "1970-04-27 BST" "1970-05-04 BST" "1970-05-11 BST" "1970-05-18 BST"
#>  [21] "1970-05-25 BST" "1970-06-01 BST" "1970-06-08 BST" "1970-06-15 BST"
#>  [25] "1970-06-22 BST" "1970-06-29 BST" "1970-07-06 BST" "1970-07-13 BST"
#>  [29] "1970-07-20 BST" "1970-07-27 BST" "1970-08-03 BST" "1970-08-10 BST"
#>  [33] "1970-08-17 BST" "1970-08-24 BST" "1970-08-31 BST" "1970-09-07 BST"
#>  [37] "1970-09-14 BST" "1970-09-21 BST" "1970-09-28 BST" "1970-10-05 BST"
#>  [41] "1970-10-12 BST" "1970-10-19 BST" "1970-10-26 BST" "1970-11-02 BST"
#>  [45] "1970-11-09 BST" "1970-11-16 BST" "1970-11-23 BST" "1970-11-30 BST"
#>  [49] "1970-12-07 BST" "1970-12-14 BST" "1970-12-21 BST" "1970-12-28 BST"
#>  [53] "1971-01-04 BST" "1971-01-11 BST" "1971-01-18 BST" "1971-01-25 BST"
#>  [57] "1971-02-01 BST" "1971-02-08 BST" "1971-02-15 BST" "1971-02-22 BST"
#>  [61] "1971-03-01 BST" "1971-03-08 BST" "1971-03-15 BST" "1971-03-22 BST"
#>  [65] "1971-03-29 BST" "1971-04-05 BST" "1971-04-12 BST" "1971-04-19 BST"
#>  [69] "1971-04-26 BST" "1971-05-03 BST" "1971-05-10 BST" "1971-05-17 BST"
#>  [73] "1971-05-24 BST" "1971-05-31 BST" "1971-06-07 BST" "1971-06-14 BST"
#>  [77] "1971-06-21 BST" "1971-06-28 BST" "1971-07-05 BST" "1971-07-12 BST"
#>  [81] "1971-07-19 BST" "1971-07-26 BST" "1971-08-02 BST" "1971-08-09 BST"
#>  [85] "1971-08-16 BST" "1971-08-23 BST" "1971-08-30 BST" "1971-09-06 BST"
#>  [89] "1971-09-13 BST" "1971-09-20 BST" "1971-09-27 BST" "1971-10-04 BST"
#>  [93] "1971-10-11 BST" "1971-10-18 BST" "1971-10-25 BST" "1971-11-01 GMT"
#>  [97] "1971-11-08 GMT" "1971-11-15 GMT" "1971-11-22 GMT" "1971-11-29 GMT"

reprex 包(v0.3.0) 於 2020 年 2 月 24 日創建

使用lubridate包,您可以計算如下:

df <- data.frame(
  year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
  month = c(1,1,1,1,2,2,11,12,12,12),
  week = c(1,2,3,4,5,6,47,48,49,50)
)

df$year_first_day <- lubridate::ymd(paste(df$year, '0101', sep = ''))
df$year_first_monday <- lubridate::ceiling_date(df$year_first_day, unit = 'weeks', week_start = 1)
df$date <- lubridate::dweeks(df$week - 1) + df$year_first_monday
df
#    year month week year_first_monday year_first_day       date
# 1  1970     1    1        1970-01-05     1970-01-01 1970-01-05
# 2  1970     1    2        1970-01-05     1970-01-01 1970-01-12
# 3  1970     1    3        1970-01-05     1970-01-01 1970-01-19
# 4  1970     1    4        1970-01-05     1970-01-01 1970-01-26
# 5  1970     2    5        1970-01-05     1970-01-01 1970-02-02
# 6  1970     2    6        1970-01-05     1970-01-01 1970-02-09
# 7  2019    11   47        2019-01-07     2019-01-01 2019-11-25
# 8  2019    12   48        2019-01-07     2019-01-01 2019-12-02
# 9  2019    12   49        2019-01-07     2019-01-01 2019-12-09
# 10 2019    12   50        2019-01-07     2019-01-01 2019-12-16

這是一個想法。 請注意,在本演示中,我僅使用了示例中的前六行。

library(dplyr)
library(lubridate)

date_seq <- tibble(
  # Create a data frame with dates from 1970 to 2019
  date = seq.Date(as.Date("1970-01-01"), as.Date("2019-12-31"), by = 1)
) %>%
  # Create weekday
  mutate(weekday = weekdays(date)) %>%
  # Filter for Monday
  filter(weekday %in% "Monday") %>%
  # Create year, month
  mutate(year = year(date), month = month(date)) %>%
  # Create week number
  mutate(week = 1:n()) %>%
  # Join the data
  left_join(dat, by = c("year", "month", "week"))
date_seq
# # A tibble: 2,609 x 6
#    date       weekday  year month  week    X1
#    <date>     <chr>   <dbl> <dbl> <int> <dbl>
#  1 1970-01-05 Monday   1970     1     1  0.21
#  2 1970-01-12 Monday   1970     1     2  0.22
#  3 1970-01-19 Monday   1970     1     3  0.34
#  4 1970-01-26 Monday   1970     1     4  0.34
#  5 1970-02-02 Monday   1970     2     5  0.35
#  6 1970-02-09 Monday   1970     2     6  0.25
#  7 1970-02-16 Monday   1970     2     7 NA   
#  8 1970-02-23 Monday   1970     2     8 NA   
#  9 1970-03-02 Monday   1970     3     9 NA   
# 10 1970-03-09 Monday   1970     3    10 NA   
# # ... with 2,599 more rows

數據

dat <- read.table(text = "year  month  week    X1 
1970      1     1   0.21
1970      1     2   0.22
1970      1     3   0.34
1970      1     4   0.34
1970      2     5   0.35
1970      2     6   0.25",
                header = TRUE, stringsAsFactors = FALSE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM