简体   繁体   English

按年和周分割日期,并获得每年正确的周数

[英]Split dates by year and week and get correct week number per year

With the following code i split dates into years and this years into weeks : 使用以下代码,我将日期分为几年,将这几年分为几周

library(lubridate)

start = as.Date('2002-01-01')
end = as.Date('2017-01-01')

dates = sample(seq(as.Date('2002-01-01 00:00:00'), as.Date('2017-04-01 00:00:00'), by="day"), end-start,replace = FALSE)

splitByYears = split(dates, year(dates))
splitYearsByWeeks = lapply(splitByYears, function(x) split(x, isoweek(x)))

Based on this output i have done several calculations. 根据这个输出,我做了几个计算。 Only when i was plotting some data i noticed that this procedure does not work perfectly: 只有当我正在绘制一些数据时,我注意到这个程序不能完美地运行:

>splitYearsByWeeks
...

$`2011`$`52`
[1] "2011-01-01" "2011-01-02" "2011-12-26"


$`2012`
$`2012`$`1`
[1] "2012-12-31" "2012-01-02" "2012-01-06" "2012-01-08"

...

Here 2011-01-01 and 2011-01-02 a part of the 52th week of 2010, but because of splitting first by year the dates are assigned to 52th week of 2011. Same problem appears in 2012-12-31, this date is part of week one of 2013, but is assigned to first week of 2012 becaus i apply the function on each year seperate. 这里2011-01-01和2011-01-02是2010年第52周的一部分,但由于按年分割,日期分配到2011年的第52周。同样的问题出现在2012-12-31,这个日期是2013年第1周的一部分,但被分配到2012年的第1周,因为我在每年分别申请功能。

Splitting by year and than splitting every year into weeks give me the format i need, but the week-year relation cannot be correct. 按年分割而不是每年分成几周给我我需要的格式,但是周年关系不正确。 To get the correct week number i can split first by week and than by year: 要获得正确的周数,我可以按周和按年拆分

splitByWeek = split(dates, isoweek(dates))
splitWeeksByYear = lapply(splitByWeek, function(x) split(x, year(x)))

But the format is not that what i need: 但格式不是我需要的:

>splitWeeksByYear
...
$`53`
$`53`$`2004`
[1] "2004-12-31" "2004-12-29" "2004-12-28"

$`53`$`2005`
[1] "2005-01-01"

$`53`$`2009`
[1] "2009-12-28"

$`53`$`2015`
[1] "2015-12-30"

$`53`$`2016`
[1] "2016-01-03"

What is the best way to get the correct weeks in the format i need : list of $year $weekNum? 我需要的格式获得正确周数的最佳方法是什么:$ year $ weekNum列表? (maybe transform the second result or do it in an complete other way?) (也许可以转换第二个结果或者以其他方式完成?)

The week-numbering according to ISO 8601 has the benefit that ISO weeks always consist of 7 days without overlap or gap (as opposed to the US and UK week-numbering conventions). 根据ISO 8601的周编号具有以下优点:ISO周总是包含7天而没有重叠或间隙(与美国和英国的周编号惯例相反)。

However, it may happen that a few days around New Year belong to an ISO week of a different ISO week-year other than the calendar date year. 但是,可能会发生新年前后的几天属于ISO日周的ISO周,而不是日历日期。

This is why lubridate has an isoyear() and an isoweek() function and format() recognizes the format specifiers %G , %g (ISO week-based year), and %V (ISO week). 这就是为什么lubridateisoyear()isoweek()函数,而format()识别格式说明符%G%g (基于ISO周的年份)和%V (ISO周)。

So, with a slight modification OP's code works as expected: 因此,稍微修改一下OP的代码按预期工作:

library(lubridate)
splitByYears = split(dates, isoyear(dates))
splitYearsByWeeks = lapply(splitByYears, function(x) split(x, isoweek(x)))
splitYearsByWeeks$`2011`$`52`
 [1] "2011-12-28" "2011-12-27" "2011-12-29" "2011-12-31" "2012-01-01" "2011-12-30" [7] "2011-12-26" 
splitYearsByWeeks$`2012`$`1`
 [1] "2012-01-03" "2012-01-07" "2012-01-06" "2012-01-04" "2012-01-08" "2012-01-05" [7] "2012-01-02" 

However, splitting dates by the ISO week-based year and ISO week can be achieved also in one go in three slightly different ways: 但是,按照基于ISO周的年份和ISO周分割dates也可以通过三种略有不同的方式实现:

splitted <- split(dates, format(dates, "%G-W%V"))
splitted$`2011-W52`
 [1] "2011-12-28" "2011-12-27" "2011-12-29" "2011-12-31" "2012-01-01" "2011-12-30" [7] "2011-12-26" 
splitted$`2012-W01`
 [1] "2012-01-03" "2012-01-07" "2012-01-06" "2012-01-04" "2012-01-08" "2012-01-05" [7] "2012-01-02" 

Alternatively, you may use the ISOweek package of which I am the author: 或者,您可以使用我作为作者的ISOweek

splitted <- split(dates, ISOweek::ISOweek(dates))

The split() function also accepts a list of factors in which case their interaction is used for the grouping: split()函数还接受一系列因子,在这种情况下,他们的交互用于分组:

library(lubridate)
splitted <- split(dates, list(isoyear(dates), isoweek(dates)))
splitted$`2011.52`
 [1] "2011-12-28" "2011-12-27" "2011-12-29" "2011-12-31" "2012-01-01" "2011-12-30" [7] "2011-12-26" 
splitted$`2012.1`
 [1] "2012-01-03" "2012-01-07" "2012-01-06" "2012-01-04" "2012-01-08" "2012-01-05" [7] "2012-01-02" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM