[英]Interpolate year-month-day from year, month, and week data in R
我繼承了一個測量范圍跨越 1970-2019 的數據集。 頭部和尾部看起來像這樣:
year month week X1
1970 1 1 0.21
1970 1 2 0.22
1970 1 3 0.34
1970 1 4 0.34
1970 2 5 0.35
1970 2 6 0.25
...
2019 11 47 0.063
2019 12 48 0.062
2019 12 49 0.068
2019 12 50 0.067
2019 12 51 0.074
2019 12 52 0.075
在每周的第一天(即星期一)記錄 X1 的每次觀察。 我想以 ISO 8601 格式 (yyyy-mm-dd) 創建一個日期列。 給定年、月和周,應該可以提取每周的星期一是一個月的哪一天。 注:每周一測量,不考慮節假日。
您可以使用基礎 R:
df <- data.frame(
year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
month = c(1,1,1,1,2,2,11,12,12,12),
week = c(1,2,3,4,5,6,47,48,49,50)
)
df$date_string <- paste(df$year,df$week,1, sep = "-")
df$date <- as.Date(x = df$date_string,format = "%Y-%U-%u")
你可以看看: https : //www.rdocumentation.org/packages/base/versions/3.6.2/topics/strptime
'%U' 轉換一年中的一周,一周的第一天需要 '1'。
這真的只是一個單線。 您可以使用lubridate
包生成自 1970 年 1 月 5 日以來每個星期一的向量,如下所示:
as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7)
這將帶你到今天。
這是一個表示自 1970 年初以來的前 100 個星期一的正則表達式:
head(as.POSIXct("1970-01-05") + lubridate::days(0:2616 * 7), 100)
#> [1] "1970-01-05 BST" "1970-01-12 BST" "1970-01-19 BST" "1970-01-26 BST"
#> [5] "1970-02-02 BST" "1970-02-09 BST" "1970-02-16 BST" "1970-02-23 BST"
#> [9] "1970-03-02 BST" "1970-03-09 BST" "1970-03-16 BST" "1970-03-23 BST"
#> [13] "1970-03-30 BST" "1970-04-06 BST" "1970-04-13 BST" "1970-04-20 BST"
#> [17] "1970-04-27 BST" "1970-05-04 BST" "1970-05-11 BST" "1970-05-18 BST"
#> [21] "1970-05-25 BST" "1970-06-01 BST" "1970-06-08 BST" "1970-06-15 BST"
#> [25] "1970-06-22 BST" "1970-06-29 BST" "1970-07-06 BST" "1970-07-13 BST"
#> [29] "1970-07-20 BST" "1970-07-27 BST" "1970-08-03 BST" "1970-08-10 BST"
#> [33] "1970-08-17 BST" "1970-08-24 BST" "1970-08-31 BST" "1970-09-07 BST"
#> [37] "1970-09-14 BST" "1970-09-21 BST" "1970-09-28 BST" "1970-10-05 BST"
#> [41] "1970-10-12 BST" "1970-10-19 BST" "1970-10-26 BST" "1970-11-02 BST"
#> [45] "1970-11-09 BST" "1970-11-16 BST" "1970-11-23 BST" "1970-11-30 BST"
#> [49] "1970-12-07 BST" "1970-12-14 BST" "1970-12-21 BST" "1970-12-28 BST"
#> [53] "1971-01-04 BST" "1971-01-11 BST" "1971-01-18 BST" "1971-01-25 BST"
#> [57] "1971-02-01 BST" "1971-02-08 BST" "1971-02-15 BST" "1971-02-22 BST"
#> [61] "1971-03-01 BST" "1971-03-08 BST" "1971-03-15 BST" "1971-03-22 BST"
#> [65] "1971-03-29 BST" "1971-04-05 BST" "1971-04-12 BST" "1971-04-19 BST"
#> [69] "1971-04-26 BST" "1971-05-03 BST" "1971-05-10 BST" "1971-05-17 BST"
#> [73] "1971-05-24 BST" "1971-05-31 BST" "1971-06-07 BST" "1971-06-14 BST"
#> [77] "1971-06-21 BST" "1971-06-28 BST" "1971-07-05 BST" "1971-07-12 BST"
#> [81] "1971-07-19 BST" "1971-07-26 BST" "1971-08-02 BST" "1971-08-09 BST"
#> [85] "1971-08-16 BST" "1971-08-23 BST" "1971-08-30 BST" "1971-09-06 BST"
#> [89] "1971-09-13 BST" "1971-09-20 BST" "1971-09-27 BST" "1971-10-04 BST"
#> [93] "1971-10-11 BST" "1971-10-18 BST" "1971-10-25 BST" "1971-11-01 GMT"
#> [97] "1971-11-08 GMT" "1971-11-15 GMT" "1971-11-22 GMT" "1971-11-29 GMT"
由reprex 包(v0.3.0) 於 2020 年 2 月 24 日創建
使用lubridate
包,您可以計算如下:
df <- data.frame(
year = c(1970,1970,1970,1970,1970,1970,2019,2019,2019,2019),
month = c(1,1,1,1,2,2,11,12,12,12),
week = c(1,2,3,4,5,6,47,48,49,50)
)
df$year_first_day <- lubridate::ymd(paste(df$year, '0101', sep = ''))
df$year_first_monday <- lubridate::ceiling_date(df$year_first_day, unit = 'weeks', week_start = 1)
df$date <- lubridate::dweeks(df$week - 1) + df$year_first_monday
df
# year month week year_first_monday year_first_day date
# 1 1970 1 1 1970-01-05 1970-01-01 1970-01-05
# 2 1970 1 2 1970-01-05 1970-01-01 1970-01-12
# 3 1970 1 3 1970-01-05 1970-01-01 1970-01-19
# 4 1970 1 4 1970-01-05 1970-01-01 1970-01-26
# 5 1970 2 5 1970-01-05 1970-01-01 1970-02-02
# 6 1970 2 6 1970-01-05 1970-01-01 1970-02-09
# 7 2019 11 47 2019-01-07 2019-01-01 2019-11-25
# 8 2019 12 48 2019-01-07 2019-01-01 2019-12-02
# 9 2019 12 49 2019-01-07 2019-01-01 2019-12-09
# 10 2019 12 50 2019-01-07 2019-01-01 2019-12-16
這是一個想法。 請注意,在本演示中,我僅使用了示例中的前六行。
library(dplyr)
library(lubridate)
date_seq <- tibble(
# Create a data frame with dates from 1970 to 2019
date = seq.Date(as.Date("1970-01-01"), as.Date("2019-12-31"), by = 1)
) %>%
# Create weekday
mutate(weekday = weekdays(date)) %>%
# Filter for Monday
filter(weekday %in% "Monday") %>%
# Create year, month
mutate(year = year(date), month = month(date)) %>%
# Create week number
mutate(week = 1:n()) %>%
# Join the data
left_join(dat, by = c("year", "month", "week"))
date_seq
# # A tibble: 2,609 x 6
# date weekday year month week X1
# <date> <chr> <dbl> <dbl> <int> <dbl>
# 1 1970-01-05 Monday 1970 1 1 0.21
# 2 1970-01-12 Monday 1970 1 2 0.22
# 3 1970-01-19 Monday 1970 1 3 0.34
# 4 1970-01-26 Monday 1970 1 4 0.34
# 5 1970-02-02 Monday 1970 2 5 0.35
# 6 1970-02-09 Monday 1970 2 6 0.25
# 7 1970-02-16 Monday 1970 2 7 NA
# 8 1970-02-23 Monday 1970 2 8 NA
# 9 1970-03-02 Monday 1970 3 9 NA
# 10 1970-03-09 Monday 1970 3 10 NA
# # ... with 2,599 more rows
數據
dat <- read.table(text = "year month week X1
1970 1 1 0.21
1970 1 2 0.22
1970 1 3 0.34
1970 1 4 0.34
1970 2 5 0.35
1970 2 6 0.25",
header = TRUE, stringsAsFactors = FALSE)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.