简体   繁体   中英

Match ISO 8601 week-of-year numbers to month-of-year numbers on Windows with German locale

This is directly related to my question POSIX date from dates in weekly time format .

However, in this question I'd like to specifically ask for how to map ISO 8601 week numbers to month of the year numbers.

To me, it seems it is not possible and/or involves some non-intuitive hacks (and even these don't really work reliably) and IMO should thus be considered as something that needs to be fixed in base R . Please correct me if I'm wrong, though

EDIT: seems like it the issue is closely related to either running on Windows and/or the locale you're on (standard German, in my case)

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

ISO 8601

(yw <- format(posix, "%Y-%V"))
# [1] "2015-52" "2015-53" "2016-53" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# [1] "2015-01-12 CET" "2015-01-12 CET" "2016-01-12 CET" "2016-01-12 CET"
# -> utterly wrong!!!

ywd <- sprintf("%s-4", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# -> still wrong -> the day of the week is not the reason

# -> no way to use ISO 8601 convention to map week of the year to month of the year

For the sake of due dilligence: it's also not possible when trying to use the US or UK conventions:

US convention

(yw <- format(posix, "%Y-%U"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

UK convention

(yw <- format(posix, "%Y-%W"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

Session info

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252     LC_CTYPE=German_Germany.1252       LC_MONETARY=German_Germany.1252   
[4] LC_NUMERIC=C                       LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fva_0.1.0       digest_0.6.10   readxl_0.1.1    dplyr_0.5.0     plyr_1.8.4      magrittr_1.5   
 [7] memoise_1.0.0   testthat_1.0.2  roxygen2_5.0.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8     lubridate_1.6.0 assertthat_0.1  packrat_0.4.8-1 crayon_1.3.2    withr_1.0.2    
 [7] R6_2.2.0        DBI_0.5-1       stringi_1.1.2   rstudioapi_0.6  tools_3.3.2     stringr_1.1.0  
[13] tibble_1.2     

> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language en                          
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2017-01-12                  

Packages ---------------------------------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.3.2)
 crayon       1.3.2   2016-06-28 CRAN (R 3.3.2)
 DBI          0.5-1   2016-09-10 CRAN (R 3.3.2)
 devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.2)
 digest     * 0.6.10  2016-08-02 CRAN (R 3.3.2)
 dplyr      * 0.5.0   2016-06-24 CRAN (R 3.3.2)
 fva        * 0.1.0   <NA>       local         
 lubridate    1.6.0   2016-09-13 CRAN (R 3.3.2)
 magrittr   * 1.5     2014-11-22 CRAN (R 3.3.2)
 memoise    * 1.0.0   2016-01-29 CRAN (R 3.3.2)
 packrat      0.4.8-1 2016-09-07 CRAN (R 3.3.2)
 plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.2)
 R6           2.2.0   2016-10-05 CRAN (R 3.3.2)
 Rcpp         0.12.8  2016-11-17 CRAN (R 3.3.2)
 readxl     * 0.1.1   2016-03-28 CRAN (R 3.3.2)
 roxygen2   * 5.0.1   2015-11-11 CRAN (R 3.3.2)
 stringi      1.1.2   2016-10-01 CRAN (R 3.3.2)
 stringr      1.1.0   2016-08-19 CRAN (R 3.3.2)
 testthat   * 1.0.2   2016-04-23 CRAN (R 3.3.2)
 tibble       1.2     2016-08-26 CRAN (R 3.3.2)
 withr        1.0.2   2016-06-20 CRAN (R 3.3.2)

Disclosure: As mentioned in this answer I have created the ISOweek package to deal with ISO 8601 week-based dates.

The question contains several flaws:

  1. The ISO 8601 week-based year is different from the calendar year.
  2. Without specifing a day of week, the conversion of year-week to year-month is ambiguous.

Week-based year vs calendar year

The OP has created sample data using

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))
(yw <- format(posix, "%Y-%V"))
 [1] "2015-52" "2015-53" "2016-53" "2016-01"

The format specification %Y returns the calendar year which apparently is wrong for the third element.

With the correct format specification %G we do get

(yw <- format(posix, "%G-%V"))
 [1] "2015-52" "2015-53" "2015-53" "2016-01"

Conversion of week-of-the-year to month-of-the-year

Just providing the ISO week-based year and week number without the day of week will yield ambiguous results.

This can be demonstrated with the (corrected) sample data which now contain three consecutive weeks in the OP's own (non-standard) year-week format:

yw
 [1] "2015-52" "2015-53" "2016-01"

With help of the ISOweek2date() function from the ISOweek package the data are converted to calendar dates. Note that ISOweek2date() requires a full ISO 8601 week-based date in the format yyyy-Www-d including the day of week. If we choose the first day of the week (Monday) we do get:

library(ISOweek)
library(magrittr)
yw %>% 
  # insert "W" to conform with ISO 8601 format
  sub("-", "-W", .) %>% 
  # append day of week
  paste0("-1") %>%
  # convert to class Date and print as yyyy-mm 
  ISOweek2date() %>% 
  format("%Y-%m")
 [1] "2015-12" "2015-12" "2016-01"

Now, we repeat this using the last day of the week (Sunday):

yw %>% 
  sub("-", "-W", .) %>% 
  paste0("-7") %>% 
  ISOweek2date() %>% 
  format("%Y-%m")
 [1] "2015-12" "2016-01" "2016-01"

Note that the second element now refers to January 2016 instead of December 2015 because the Sunday of week 53 is in January and the Monday of this week still is in December.

R 日期时间格式参数的文档?strptime说“%V”在输入时将被忽略。

Pretty sure something else besides base R needs changing (see note at end tho):

some_dates <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

(year_week <- format(some_dates, "%Y %U"))
## [1] "2015 51" "2015 52" "2016 00" "2016 01"

(year_week_day <- sprintf("%s 1", year_week))
## [1] "2015 51 1" "2015 52 1" "2016 00 1" "2016 01 1"

(as.POSIXct(year_week_day, format = "%Y %U %u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

It works with the dashes, too:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

and, despite dashes being OK ISO form, they can lead to confusion in readers when various values aren't >12 or <0

NOTE

As the comment thread indicates this is the behaviour on Windows:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 PST" "2015-12-28 PST" NA               "2016-01-04 PST"

(Windows 10 64bit, R 3.3.2 for me/this example)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM