简体   繁体   English

使用R将日历年到水年的数据框重新排序

[英]Reorder data frame from calendar year to water year using R

I know questions adjacent to this one have been posed before ( here and here ), but some of the assumptions and goals are different enough that I haven't been able to translate it to this situation. 我知道之前( 在这里这里 )已经提出问题相邻的问题,但是某些假设和目标已经足够不同,以至于我无法将其转化为这种情况。 I am also an R novice. 我也是R新手。

I have a data frame structured like so: 我有一个像这样的数据框架:

STATION     DATE        PRCP 
USC00352972 1910-01-01  0 
USC00352972 1910-02-01  0   
USC00352972 1910-03-01  0
USC00352972 1910-04-01  0
USC00352972 1910-05-01  0
USC00352972 1910-06-01  0
USC00352972 1910-07-01  0
USC00352972 1910-08-01  0
USC00352972 1910-09-01  0
USC00352972 1910-10-01  0
USC00352972 1910-11-01  0
USC00352972 1910-12-01  0
...         ...         .
US1ORLA0076 2018-01-01  0
US1ORLA0076 2018-02-01  0
US1ORLA0076 2018-03-01  0
US1ORLA0076 2018-04-01  0
US1ORLA0076 2018-05-01  0
US1ORLA0076 2018-06-01  0
US1ORLA0076 2018-07-01  0
US1ORLA0076 2018-08-01  0
US1ORLA0076 2018-09-01  0
US1ORLA0076 2018-10-01  0
US1ORLA0076 2018-11-01  0
US1ORLA0076 2018-12-01  0

The data contains dozens of stations and hundreds of thousands of observations. 数据包含数十个站点和数十万个观测值。 It is listed alphabetically by station and then ordered by calendar year (Jan-Dec). 按站的字母顺序列出,然后按日历年(1月至12月)排序。

I want to rearrange this data set such that it is listed by our "water year" (Oct-Sep). 我想重新排列此数据集,以使其在我们的“水年”(10月至9月)中列出。 Conceptually, this is as simple as: 从概念上讲,这很简单:

For each row (in chronological order) > if row's month is 10-12 > place that row directly above it's station's earliest dated row. 对于每一行(按时间顺序)>如果该行的月份是10-12,则将该行放置在它的站点最早的日期行的正上方。

But I doubt that this logic conforms to R vernacular, and I'm unsure how to code it anyways. 但是我怀疑这种逻辑是否符合R语言,而且我不确定如何进行编码。 What is the most conventional way to achieve this result in R? 在R中达到此结果的最常规方法是什么? What is the most efficient? 什么是最有效的?

One option is to introduce a new column on which data will be arranged. 一种选择是引入一个新列,在该列上将安排数据。 One can subtract 1 year from the date when month is between Oct - Dec so that data for those rows appears with previous years/period. 可以从月份在Oct - Dec月之间的日期减去1 year ,以便这些行的数据与以前的年份/期间一起显示。

library(dplyr)
library(lubridate)

df %>% mutate(DATE = ydm(DATE)) %>%
  mutate(WaterPeriod = 
      as.Date(ifelse(month(DATE)>=10, DATE-years(1), DATE),origin = "1970-01-01")) %>%
  arrange(STATION, WaterPeriod) %>%
  select(-WaterPeriod)

A simple base R approach. 一个简单的基础R方法。
If month is October, November or December, shift year one on. 如果月份是10月,11月或12月,则将第一年继续进行。

xd <- as.Date(seq(1, 1500, by=80), origin="1910-01-01")

w.year <- as.numeric(format(xd, "%Y"))
oct.nov.dec <- as.numeric(format(xd, "%m")) > 9
w.year[oct.nov.dec] <- w.year[oct.nov.dec] + 1

data.frame("Calendar_date"=xd, "Water_year"=w.year)

#    Calendar_date Water_year
# 1     1910-01-02       1910
# 2     1910-03-23       1910
# 3     1910-06-11       1910
# 4     1910-08-30       1910
# 5     1910-11-18       1911
# 6     1911-02-06       1911
# 7     1911-04-27       1911
# 8     1911-07-16       1911
# 9     1911-10-04       1912
# 10    1911-12-23       1912
# 11    1912-03-12       1912
# 12    1912-05-31       1912
# 13    1912-08-19       1912
# 14    1912-11-07       1913
# 15    1913-01-26       1913
# 16    1913-04-16       1913
# 17    1913-07-05       1913
# 18    1913-09-23       1913
# 19    1913-12-12       1914

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM