简体   繁体   English

R:如何获取本月的周数

[英]R: How to get the Week number of the month

I am new in R.我是 R 的新手。
I want the week number of the month, which the date belongs to.我想要日期所属的月份的周数。

By using the following code:通过使用以下代码:

>CurrentDate<-Sys.Date()
>Week Number <- format(CurrentDate, format="%U")
>Week Number
"31"

%U will return the Week number of the year . %U 将返回一年中的周数。
But i want the week number of the month.但我想要一个月的周数。
If the date is 2014-08-01 then i want to get 1.( The Date belongs to the 1st week of the month).如果日期是 2014-08-01,那么我想得到 1。(日期属于该月的第 1 周)。

Eg:例如:
2014-09-04 -> 1 (The Date belongs to the 1st week of the month). 2014-09-04 -> 1(日期属于该月的第一周)。
2014-09-10 -> 2 (The Date belongs to the 2nd week of the month). 2014-09-10 -> 2(日期属于该月的第 2 周)。
and so on...等等...

How can i get this?我怎样才能得到这个?

Reference: http://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html参考: http : //astrostatistics.psu.edu/su07/R/html/base/html/strptime.html

By analogy of the weekdays function:类比weekdays函数:

monthweeks <- function(x) {
    UseMethod("monthweeks")
}
monthweeks.Date <- function(x) {
    ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.POSIXlt <- function(x) {
    ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.character <- function(x) {
    ceiling(as.numeric(format(as.Date(x), "%d")) / 7)
}
dates <- sample(seq(as.Date("2000-01-01"), as.Date("2015-01-01"), "days"), 7)
dates
#> [1] "2004-09-24" "2002-11-21" "2011-08-13" "2008-09-23" "2000-08-10" "2007-09-10" "2013-04-16"
monthweeks(dates)
#> [1] 4 3 2 4 2 2 3

Another solution to use stri_datetime_fields() from the stringi package:使用stringi包中的stri_datetime_fields()另一种解决方案:

stringi::stri_datetime_fields(dates)$WeekOfMonth
#> [1] 4 4 2 4 2 3 3

You can use day from the lubridate package.您可以使用daylubridate包。 I'm not sure if there's a week-of-month type function in the package, but we can do the math.我不确定包中是否有一个星期类型的函数,但我们可以计算一下。

library(lubridate)
curr <- Sys.Date()
# [1] "2014-08-08"
day(curr)               ## 8th day of the current month
# [1] 8
day(curr) / 7           ## Technically, it's the 1.14th week
# [1] 1.142857
ceiling(day(curr) / 7)  ## but ceiling() will take it up to the 2nd week.
# [1] 2

Issue Overview问题概述

It was difficult to tell which answers worked, so I built my own function nth_week and tested it against the others.很难判断哪些答案有效,因此我构建了自己的函数nth_week并针对其他函数nth_week进行了测试。

The issue that's leading to most of the answers being incorrect is this:导致大多数答案不正确的问题是:

  • The first week of a month is often a short-week一个月的第一周通常是短暂的一周
  • Same with the last week of the month与本月最后一周相同

For example, October 1st 2019 is a Tuesday, so 6 days into October (which is a Sunday) is already the second week.例如,2019 年 10 月 1 日是星期二,因此进入 10 月的 6 天(即星期日)已经是第二周。 Also, contiguous months often share the same week in their respective counts, meaning that the last week of the prior month is commonly also the first week of the current month.此外,连续的月份通常在其各自的计数中共享同一周,这意味着上个月的最后一周通常也是当月的第一周。 So, we should expect a week count higher than 52 per year and some months that contain a span of 6 weeks.因此,我们应该期望每年的周数高于 52,而有些月份的跨度为 6 周。

Results Comparison结果比较

Here's a table showing examples where some of the above suggested algorithms go awry:下面的表格显示了上述一些建议算法出错的示例:

DATE            Tori user206 Scri Klev Stringi Grot Frei Vale epi iso coni
Fri-2016-01-01    1     1      1   1      5      1    1    1    1   1   1
Sat-2016-01-02    1     1      1   1      1      1    1    1    1   1   1
Sun-2016-01-03    2     1      1   1      1      2    2    1  -50   1   2
Mon-2016-01-04    2     1      1   1      2      2    2    1  -50 -51   2
----
Sat-2018-12-29    5     5      5   5      5      5    5    4    5   5   5
Sun-2018-12-30    6     5      5   5      5      6    6    4  -46   5   6
Mon-2018-12-31    6     5      5   5      6      6    6    4  -46 -46   6
Tue-2019-01-01    1     1      1   1      6      1    1    1    1   1   1

You can see that only Grothendieck, conighion, Freitas, and Tori are correct due to their treatment of partial week periods.您可以看到只有Grothendieck、conighion、Freitas 和 Tori是正确的,因为他们处理了部分周期间。 I compared all days from year 100 to year 3000;我比较了从 100 年到 3000 年的所有日子; there are no differences among those 4. (Stringi is probably correct for noting weekends as separate, incremented periods, but I didn't check to be sure; epiweek() and isoweek(), because of their intended uses, show some odd behavior near year-ends when using them for week incrementation.)这 4 个之间没有区别。(Stringi 可能正确地将周末记为单独的、递增的时期,但我没有检查确定;epiweek() 和 isoweek(),由于它们的预期用途,显示出一些奇怪的行为使用它们进行周增量时接近年末。)

Speed Comparison速度比较

Below are the tests for efficiency between the implementations of: Tori, Grothendieck, Conighion, and Freitas以下是以下实现之间的效率测试: Tori、Grothendieck、ConighionFreitas

# prep
library(lubridate)
library(tictoc)

kepler<- ymd(15711227) # Kepler's birthday since it's a nice day and gives a long vector of dates
some_dates<- seq(kepler, today(), by='day')

# test speed of Tori algorithm
tic(msg = 'Tori')
Tori<- (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
toc()
Tori: 0.19 sec elapsed
# test speed of Grothendieck algorithm
wk <- function(x) as.numeric(format(x, "%U"))
tic(msg = 'Grothendieck')
Grothendieck<- (wk(some_dates) - wk(as.Date(cut(some_dates, "month"))) + 1)
toc()
Grothendieck: 1.99 sec elapsed
# test speed of conighion algorithm
tic(msg = 'conighion')
weeknum <- as.integer( format(some_dates, format="%U") )
mindatemonth <- as.Date( paste0(format(some_dates, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
conighion <- weeknum - (weeknummin - 1) # this is as an integer
toc()
conighion: 2.42 sec elapsed
# test speed of Freitas algorithm
first_day_of_month_wday <- function(dx) {
   day(dx) <- 1
   wday(dx)
 }
tic(msg = 'Freitas')
Freitas<- ceiling((day(some_dates) + first_day_of_month_wday(some_dates) - 1) / 7)
toc()
Freitas: 0.97 sec elapsed



Fastest correct algorithm by about at least 5X最快正确算法至少提高 5 倍

require(lubridate)需要(润滑)

(5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7 (5 + 天(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7

# some_dates above is any vector of dates, like:
some_dates<- seq(ymd(20190101), today(), 'day')



Function Implementation功能实现

I also wrote a generalized function for it that performs either month or year week counts, begins on a day you choose (ie say you want to start your week on Monday), labels output for easy checking, and is still extremely fast thanks to lubridate.我还为它编写了一个通用函数,它执行月或年的周计数,从你选择的一天开始(即你想在星期一开始你的一周),标签输出以便于检查,并且由于 lubridate 仍然非常快.

nth_week<- function(dates = NULL,
                    count_weeks_in = c("month","year"),
                    begin_week_on = "Sunday"){

  require(lubridate)

  count_weeks_in<- tolower(count_weeks_in[1])

  # day_names and day_index are for beginning the week on a day other than Sunday
  # (this vector ordering matters, so careful about changing it)
  day_names<- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")

  # index integer of first match
  day_index<- pmatch(tolower(begin_week_on),
                     tolower(day_names))[1]


  ### Calculate week index of each day

  if (!is.na(pmatch(count_weeks_in, "year"))) {

    # For year:
    # sum the day of year, index for day of week at start of year, and constant 5 
    #  then integer divide quantity by 7   
    # (explicit on package so lubridate and data.table don't fight)
    n_week<- (5 + 
                lubridate::yday(dates) + 
                lubridate::wday(floor_date(dates, 'year'), 
                                week_start = day_index)
    ) %/% 7

  } else {

    # For month:
    # same algorithm as above, but for month rather than year
    n_week<- (5 + 
                lubridate::day(dates) + 
                lubridate::wday(floor_date(dates, 'month'), 
                                week_start = day_index)
    ) %/% 7

  }

  # naming very helpful for review
  names(n_week)<- paste0(lubridate::wday(dates,T), '-', dates)

  n_week

}



Function Output功能输出

# Example raw vector output: 
some_dates<- seq(ymd(20190930), today(), by='day')
nth_week(some_dates)

Mon-2019-09-30 Tue-2019-10-01 Wed-2019-10-02 
             5              1              1 
Thu-2019-10-03 Fri-2019-10-04 Sat-2019-10-05 
             1              1              1 
Sun-2019-10-06 Mon-2019-10-07 Tue-2019-10-08 
             2              2              2 
Wed-2019-10-09 Thu-2019-10-10 Fri-2019-10-11 
             2              2              2 
Sat-2019-10-12 Sun-2019-10-13 
             2              3 
# Example tabled output:
library(tidyverse)

nth_week(some_dates) %>% 
  enframe('DATE','nth_week_default') %>% 
  cbind(some_year_day_options = as.vector(nth_week(some_dates, count_weeks_in = 'year', begin_week_on = 'Mon')))

             DATE nth_week_default some_year_day_options
1  Mon-2019-09-30                5                    40
2  Tue-2019-10-01                1                    40
3  Wed-2019-10-02                1                    40
4  Thu-2019-10-03                1                    40
5  Fri-2019-10-04                1                    40
6  Sat-2019-10-05                1                    40
7  Sun-2019-10-06                2                    40
8  Mon-2019-10-07                2                    41
9  Tue-2019-10-08                2                    41
10 Wed-2019-10-09                2                    41
11 Thu-2019-10-10                2                    41
12 Fri-2019-10-11                2                    41
13 Sat-2019-10-12                2                    41
14 Sun-2019-10-13                3                    41

Hope this work saves people the time of having to weed through all the responses to figure out which are correct.希望这项工作可以节省人们必须清除所有响应以找出正确答案的时间。

I don't know R but if you take the week of the first day in the month you could use it to get the week in the month我不知道 R 但如果你取一个月的第一天的那一周,你可以用它来得到这个月的一周

2014-09-18
First day of month = 2014-09-01
Week of first day on month = 36
Week of 2014-09-18 = 38
Week in the month = 1 + (38 - 36) = 3

Using lubridate you can do使用lubridate你可以做

ceiling((day(date) + first_day_of_month_wday(date) - 1) / 7)

Where the function first_day_of_month_wday returns the weekday of the first day of month.其中函数first_day_of_month_wday返回一个月的第一天的工作日。

first_day_of_month_wday <- function(dx) {
  day(dx) <- 1
  wday(dx)
}

This adjustment must be done in order to get the correct week number otherwise if you have the 7th day of month on a Monday you will get 1 instead of 2, for example.必须进行此调整才能获得正确的周数,否则,例如,如果您在每月的第 7 天是星期一,您将得到 1 而不是 2。 This is only a shift in the day of month.这只是一个月中的一天的变化。 The minus 1 is necessary because when the first day of month is sunday the adjustment is not needed, and the others weekdays follow this rule.负 1 是必要的,因为当月的第一天是星期日时,不需要调整,其他工作日遵循此规则。

I came across the same issue and I solved it with mday from data.table package.我遇到了同样的问题,我用data.table包中的mday解决了它。 Also, I realized that when using the ceiling() function, one also needs to account for the '5th week' situation.此外,我意识到在使用ceiling()函数时,还需要考虑“第 5 周”的情况。 For example ceiling of the 30th day of a month ceiling(30/7) will give 5 !例如ceiling一个月的第30天的ceiling(30/7)将给予5! Therefore, the ifelse statement below.因此,下面的ifelse语句。

# Create a sample data table with days from year 0 until present
DT <- data.table(days = seq(as.Date("0-01-01"), Sys.Date(), "days"))
# compute the week of the month and account for the '5th week' case
DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )]

> DT
              days week
     1: 0000-01-01    1
     2: 0000-01-02    1
     3: 0000-01-03    1
     4: 0000-01-04    1
     5: 0000-01-05    1
    ---                
736617: 2016-10-14    2
736618: 2016-10-15    3
736619: 2016-10-16    3
736620: 2016-10-17    3
736621: 2016-10-18    3

To have an idea about the speed, then run:要了解速度,请运行:

system.time( DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )] )
   # user  system elapsed 
   # 3.23    0.05    3.27

It took approx.花了大约。 3 seconds to compute the weeks for more than 700 000 days. 3 秒计算超过 700 000 天的周数。

However, the ceiling way above will always create the last week longer than all the other weeks (the four weeks have 7,7,7, and 9 or 10 days).但是,上方的ceiling方式将始终使最后一周比所有其他周长(四个星期有 7、7、7 和 9 或 10 天)。 Another way would be to use something like另一种方法是使用类似的东西

ceiling(1:31/31*4)
 [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4

where you get 7, 8 , 8 and 8 days per respective week in a 31 days month.在一个月的 31 天中,您每周分别有 7、8、8 和 8 天。

DT[, week2 := ceiling(mday(days)/31*4)]

There is a simple way to do it with lubridate package:使用lubridate包有一个简单的方法:

isoweek() returns the week as it would appear in the ISO 8601 system, which uses a reoccurring leap week. isoweek()返回在 ISO 8601 系统中出现的星期,该系统使用重复出现的闰周。

epiweek() is the US CDC version of epidemiological week. epiweek()是美国 CDC 版本的流行病学周。 It follows same rules as isoweek() but starts on Sunday.它遵循与isoweek()相同的规则, isoweek()星期日开始。 In other parts of the world the convention is to start epidemiological weeks on Monday, which is the same as isoweek() .在世界其他地方,惯例是从星期一开始流行病学周,这与isoweek()相同。

Reference here参考这里

I am late to the party and maybe noone is gonna read this answer...我参加聚会迟到了,也许没有人会阅读这个答案...

Anyway, why not stay simple and do it like this:无论如何,为什么不保持简单并这样做:

library(lubridate)

x <- ymd(20200311, 20200308)

week(x) - week(floor_date(x, unit = "months")) + 1

[1] 3 2

I don't know any build in functions but a work around would be我不知道任何内置功能,但解决方法是

CurrentDate <- Sys.Date()
# The number of the week relative to the year
weeknum <- as.integer( format(CurrentDate, format="%U") )

# Find the minimum week of the month relative to the year
mindatemonth <- as.Date( paste0(format(CurrentDate, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month

# Calculate the number of the week relative to the month
weeknum <- weeknum - (weeknummin - 1) # this is as an integer

# With the following you can convert the integer to the same format of 
# format(CurrentDate, format="%U")
formatC(weeknum, width = 2, flag = "0")

Simply do this:只需这样做:

library(lubridate)图书馆(润滑)

ds1$Week <- week(ds1$Sale_Date) ds1$Week <- 周(ds1$Sale_Date)

This is high performance!这是高性能! It instantly works on my 12 milion rows dataset.它立即适用于我的 1200 万行数据集。 On example above, ds1 is the dataset, Sale_Date is a date column (like "2015-11-23") The other approach, using "as.integer( format..." might work on small datasets, but on 12 million rows it would keep running forever...在上面的示例中,ds1 是数据集,Sale_Date 是日期列(如“2015-11-23”) 另一种方法,使用“as.integer(format...”) 可能适用于小型数据集,但适用于 1200 万行它将永远运行...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM