简体   繁体   中英

How can I create a sequence of year-week string values based on existing dates?

I am plotting weekly figures that cross over from 2018 into 2019 and the tick marks on my X-axis represent the year then week.

For example:

2018-50, 2018-51, 2018-52, 2018-53, 2019-01, 2019-02, 2019-03

I have two data frames and the dates in either aren't always going to be the same. As such, one solution I have thought of that might work is to find the lowest yearWeek value in either data frame, and the maximum yearWeek value in either data frame, and to then create a sequence using those two values. Note that both values could either exist within a single data frame or one data frame could have the lowest/earliest value and the other the highest/latest value.

Both data frames look like this:

  week yearWeek      month  day       date
1   31  2018-31 2018-08-01  Wed 2018-08-01
2   31  2018-31 2018-08-01  Thu 2018-08-02
3   31  2018-31 2018-08-01  Fri 2018-08-03
4   31  2018-31 2018-08-01  Sat 2018-08-04
5   32  2018-32 2018-08-01  Sun 2018-08-05
6   32  2018-32 2018-08-01  Mon 2018-08-06

I have looked for a solution and this answer is almost there, but not quite.

The problems with this solution are:

  • The single-figure week number don't have a 0 before them; and
  • Despite specifying seq(31:53) , for example, the output starts from 1 (I know why this happens); and
  • There doesn't seem to be a way to stop the count at 53 using this method (2018 had a (short) 53rd week which I would like to include) and resume from 2019-01 onwards.

I want to be able to set the X-axis range from 2018-31 (31st week of 2018) to 2019-13 (13th week of 2019).

Something like this:

在此处输入图片说明

In short, how can I create a sequence of year-week values ranging from the minimum date value to the maximum date value (in this case 2018-31 - 2019-13 )?

I think this would work for you

x1 <- c(31:53)
x2 <- sprintf("%02d", c(1:13))
paste(c(rep(2018, length(x1)), rep(2019, length(x2))), c(x1, x2), sep = "-")

# [1] "2018-31" "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" 
#     "2018-38" "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" 
#     "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" 
#     "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" 
# "2019-06" "2019-07" "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13"

For the updated question we can do

#rbind both the dataset
df <- rbind(df1, df2)

#convert them to date
df$Date <- as.Date(df$date)

#Generate a sequence from min date to maximum date, format them 
# to year-week combination and select only the unique ones
unique(format(seq(min(df$Date), max(df$Date), by = "day"), "%Y-%W"))

Define two sequences, and then restrict to the range you want:

years <- c("2018", "2019")
months <- sprintf("%02d", c(1:52))

result <- apply(expand.grid(years, months), 1, function(x) paste(x,collapse="-"))
result <- result[result >= "2018-31" & result <= "2019-13"]
result

 [1] "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07"
 [8] "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13" "2018-31"
[15] "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" "2018-38"
[22] "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" "2018-45"
[29] "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52"

Note that the pruning off of dates we don't want works here even using text date strings, because all dates are fixed width strings, and are left zero padded, if necessary. So, sorting therefore works as it would for actual numbers.

here is a possibility using the str_pad function from the stringr package:

weeks <- str_pad(41:65 %% 53 + 1, 2, "left", "0")
years <- ifelse(41:65 <= 52, "2018", "2019")
paste(years, weeks, sep = "-")
     [1] "2018-42" "2018-43" "2018-44" "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07" "2019-08" "2019-09"
[22] "2019-10" "2019-11" "2019-12" "2019-13"

As I just learned from the other two answers sprintf provides a base alternative to str_pad . So you can also use

weeks <- sprintf("%02d", 41:65 %% 53 + 1)

Here is a possibility using strftime :

weeks <- seq(from = ISOdate(2018,12,10), to = ISOdate(2019,4,1), by="week")
strftime(weeks,format="%Y-%W") 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM