简体   繁体   English

如何在R中解析年份+周数?

[英]How to Parse Year + Week Number in R?

Is there a good way to get a year + week number converted a date in R? 是否有一种很好的方法可以将一年+周数转换为R中的日期? I have tried the following: 我尝试过以下方法:

> as.POSIXct("2008 41", format="%Y %U")
[1] "2008-02-21 EST"
> as.POSIXct("2008 42", format="%Y %U")
[1] "2008-02-21 EST"

According to ?strftime : 根据?strftime

%Y Year with century. %Y世纪。 Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year) . 请注意,虽然原始公历中没有零,但ISO 8601:2004将其定义为有效(解释为1BC):请参阅http://en.wikipedia.org/wiki/0_(year) Note that the standard also says that years before 1582 in its calendar should only be used with agreement of the parties involved. 请注意,该标准还规定,在其日历中1582年之前的年份应仅在有关各方同意的情况下使用。

%U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). %U作为十进制数字的年份(00-53),使用星期日作为一周的第一天1(并且通常以一年的第一个星期日作为第1周的第1天)。 The US convention. 美国大会。

This is kinda like another question you may have seen before . 这有点像你之前看到的另一个问题。 :) :)

The key issue is: what day should a week number specify? 关键问题是:一周的数字指定哪一天? Is it the first day of the week? 这是一周的第一天吗? The last? 最后? That's ambiguous. 这是模棱两可的。 I don't know if week one is the first day of the year or the 7th day of the year, or possibly the first Sunday or Monday of the year (which is a frequent interpretation). 我不知道第一周是一年的第一天还是一年的第七天,或者可能是一年的第一个星期日或星期一(这是一个经常的解释)。 (And it's worse than that: these generally appear to be 0-indexed, rather than 1-indexed.) So, an enumerated day of the week needs to be specified. (而且它比这更糟糕:这些通常看起来是0索引,而不是1索引。)因此,需要指定一周中列举的日期。

For instance, try this: 例如,试试这个:

as.POSIXlt("2008 42 1", format = "%Y %U %u")

The %u indicator specifies the day of the week. %u指示符指定星期几。

Additional note: See ?strptime for the various options for format conversion. 附加说明:有关格式转换的各种选项,请参阅?strptime It's important to be careful about the enumeration of weeks, as these can be split across the end of the year, and day 1 is ambiguous: is it specified based on a Sunday or Monday, or from the first day of the year? 重要的是要注意几周的枚举,因为这些可以在年底分开,第1天是模棱两可的:它是根据星期日或星期一,还是从一年的第一天开始指定的? This should all be specified and tested on the different systems where the R code will run. 这应该在R代码运行的不同系统上进行指定和测试。 I'm not certain that Windows and POSIX systems sing the same tune on some of these conversions, hence I'd test and test again. 我不确定Windows和POSIX系统在其中一些转换中是否会唱出相同的曲调,因此我会再次进行测试和测试。

Day-of-week == zero in the POSIXlt DateTimesClasses system is Sunday. POSIXlt DateTimesClasses系统中的星期几==零是星期日。 Not exactly Biblical and not in agreement with the R indexing that starts at "1" convention either, but that's what it is. 不完全符合圣经,也不符合从“1”惯例开始的R索引,但这就是它的本质。 Week zero is the first (partial) week in the year. 第0周是一年中的第一个(部分)周。 Week one (but day of week zero) starts with the first Sunday. 第一周(但是第零周的一天)从第一个星期日开始。 And all the other sequence types in POSIXlt have 0 as their starting point. POSIXlt中的所有其他序列类型都以0为起点。 It kind of interesting to see what coercing the list elements of POSIXlt objects do. 看看强制POSIXlt对象的列表元素做什么很有意思。 The only way you can actually change a POSIXlt date is to alter the $year, the $mon or the $mday elements. 实际更改POSIXlt日期的唯一方法是更改​​$ year,$ mon或$ mday元素。 The others seem to be epiphenomena. 其他似乎是附带现象。

  today <- as.POSIXlt(Sys.Date())
  today  # Tuesday
#[1] "2012-02-21 UTC"
     today$wday <- 0  # attempt to make it Sunday
     today
# [1] "2012-02-21 UTC"   The attempt fails
 today$mday <- 19
 today
#[1] "2012-02-19 UTC"   Success

I did not come up with this myself (it's taken from a blog post by Forester), but nevertheless I thought I'd add this to the answer list because it's the first implementation of the ISO 8601 week number convention that I've seen in R. 我自己没有提出这个问题(它来自Forester的博客文章 ),但是我想我会把它添加到答案列表中,因为它是我见过的ISO 8601周数公约的第一次实现。 R.

No doubt, week numbers are a very ambiguous topic, but I prefer an ISO standard over the current implementation of week numbers via format(..., "%U") because it seems that this is what most people agreed on, at least in Germany (calendars etc.). 毫无疑问,周数是一个非常模糊的话题,但我更喜欢通过format(..., "%U")的当前周数实现的ISO标准,因为这似乎是大多数人同意的,至少在德国(日历等)。

I've put the actual function def at the bottom to facilitate focusing on the output first. 我已将实际功能def置于底部,以便首先关注输出。 Also, I just stumbled across package ISOweek , maybe worth a try. 另外,我偶然发现了包装ISOweek ,也许值得一试。

Approach Comparison 方法比较

x.days  <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
x.names <- sapply(1:length(posix), function(x) {
    x.day <- as.POSIXlt(posix[x], tz="Europe/Berlin")$wday
    if (x.day == 0) {
        x.day <- 7
    }
    out <- x.days[x.day]
})

data.frame(
    posix, 
    name=x.names,
    week.r=weeknum, 
    week.iso=ISOweek(as.character(posix), tzone="Europe/Berlin")$weeknum
)

# Result

        posix name week.r week.iso
1  2012-01-01  Sun      1  4480458
2  2012-01-02  Mon      1        1
3  2012-01-03  Tue      1        1
4  2012-01-04  Wed      1        1
5  2012-01-05  Thu      1        1
6  2012-01-06  Fri      1        1
7  2012-01-07  Sat      1        1
8  2012-01-08  Sun      2        1
9  2012-01-09  Mon      2        2
10 2012-01-10  Tue      2        2
11 2012-01-11  Wed      2        2
12 2012-01-12  Thu      2        2
13 2012-01-13  Fri      2        2
14 2012-01-14  Sat      2        2
15 2012-01-15  Sun      3        2
16 2012-01-16  Mon      3        3
17 2012-01-17  Tue      3        3
18 2012-01-18  Wed      3        3
19 2012-01-19  Thu      3        3
20 2012-01-20  Fri      3        3
21 2012-01-21  Sat      3        3
22 2012-01-22  Sun      4        3
23 2012-01-23  Mon      4        4
24 2012-01-24  Tue      4        4
25 2012-01-25  Wed      4        4
26 2012-01-26  Thu      4        4
27 2012-01-27  Fri      4        4
28 2012-01-28  Sat      4        4
29 2012-01-29  Sun      5        4
30 2012-01-30  Mon      5        5
31 2012-01-31  Tue      5        5

Function Def 功能定义

It's taken directly from the blog post , I've just changed a couple of minor things. 它是直接从博客文章中获取的 ,我刚刚改变了一些小问题。 The function is still kind of sketchy (eg the week number of the first date is far off), but I find it to be a nice start! 该功能仍然是粗略的(例如,第一个日期的周数远远不够),但我发现这是一个不错的开始!

ISOweek <- function(
    date, 
    format="%Y-%m-%d", 
    tzone="UTC", 
    return.val="weekofyear"
){
  ##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
  ##date should be a vector of class Date or a vector of formatted character strings
  ##format refers to the date form used if a vector of
  ##  character strings  is supplied

  ##convert date to POSIXt format 
  if(class(date)[1]%in%c("Date","character")){
    date=as.POSIXlt(date,format=format, tz=tzone)
  }

#  if(class(date)[1]!="POSIXt"){
  if (!inherits(date, "POSIXt")) {
    print("Date is of wrong format.")
    break
  }else if(class(date)[2]=="POSIXct"){
    date=as.POSIXlt(date, tz=tzone)
  }
print(date)

  if(return.val=="dayofyear"){
    ##add 1 because POSIXt is base zero
    return(date$yday+1)
  }else if(return.val=="weekofyear"){
    ##Based on the ISO8601 weekdate system,
    ## Monday is the first day of the week
    ## W01 is the week with 4 Jan in it.
    year=1900+date$year
    jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
    wday=jan4$wday

    wday[wday==0]=7  ##convert to base 1, where Monday == 1, Sunday==7

    ##calculate the date of the first week of the year
    weekstart=jan4-(wday-1)*86400  
    weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))

    #########################################################################
    ##calculate week for days of the year occuring in the next year's week 1.
    #########################################################################
    mday=date$mday
    wday=date$wday
    wday[wday==0]=7
    year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
    weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)

    ################################################################
    ##calculate week for days of the year occuring prior to week 1.
    ################################################################

    ##first calculate the numbe of weeks in the previous year
    year.shift=year-1
    jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
    wday=jan4.shift$wday
    wday[wday==0]=7  ##convert to base 1, where Monday == 1, Sunday==7
    weekstart=jan4.shift-(wday-1)*86400
    weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))

    ##update year and week
    year=ifelse(weeknum==0,year.shift,year)
    weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)

    return(list("year"=year,"weeknum"=weeknum))
  }else{
    print("Unknown return.val")
    break
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM