简体   繁体   English

将年-月字符串列转换为季度分类

[英]Convert year-month string column into quarterly bins

I am currently working with a large phenology data set, where there are multiple observations of trees for a given month. 我目前正在使用大型物候数据集,其中在给定的月份中有多处树木观察。 I want to assign these observations into three month clusters or bins. 我想将这些观察结果分配到三个月的群集或垃圾箱中。 I am currently using the following code: 我目前正在使用以下代码:

Cluster.GN <- ifelse(Master.feed.parts.gn$yr.mo=="2007.1", 1,
              ifelse(Master.feed.parts.gn$yr.mo=="2007.11", 1,....     
              ifelse(Master.feed.parts.gn$yr.mo=="2014.05", 17, NA)

This code works, but it is very cumbersome as there are over 50 months. 该代码有效,但是因为有超过50个月的时间,所以非常麻烦。 I have had trouble finding another solution because this "binning" is not based on number of observations (as within each month there can be up to 4000 observations) and it is not chronological, as some months are missing. 我很难找到另一种解决方案,因为这种“分类”不是基于观察值的数量(每个月最多可以观察4000个观察值),而且也不是时间顺序的,因为缺少了几个月。 Any help you can provide would be highly appreciated. 您能提供的任何帮助将不胜感激。

UPDATE I: I used the "cut" function in R. I tried setting the breaks to 17, as that is how many three month bins I should have. 更新I:我在R中使用了“剪切”功能。我尝试将间隔设置为17,因为这是我应该拥有的三个月垃圾箱数。 But when I use table(Cluster.GN) it shows that only the odd numbered "bins" have observations (sorry but I can't figure out how to get the table uploaded here). 但是,当我使用table(Cluster.GN)时,它表明只有奇数编号的“ bins”具有观察值(对不起,但我不知道如何在此处上传表格)。 >Cluster.GN <- cut(Master.feed.parts.gn$yr.mo, breaks= 17, c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17"), include.lowest=TRUE) > Cluster.GN <-cut(Master.feed.parts.gn $ yr.mo,breaks = 17,c(“ 1”,“ 2”,“ 3”,“ 4”,“ 5”,“ 6”, “ 7”,“ 8”,“ 9”,“ 10”,“ 11”,“ 12”,“ 13”,“ 14”,“ 15”,“ 16”,“ 17”),include.lowest = TRUE )

UPDATE: this answer was a quick hack, I didn't check zoo library. 更新:这个答案是一个快速的黑客,我没有检查zoo图书馆。 For the right way to do it, please see G Grothendieck's answer using zoo::as.yearqtr() 对于正确的方法,请参见使用zoo::as.yearqtr() G Grothendieck的答案。


All you need to do is convert the yr.mo field from a year-month string (eg 2007.11 ) into an integer in the range 1..17, binning on every quarter (ie months 1..3 into first bin, 4..6 into second bin etc.). 您需要做的就是将yr.mo字段从年月字符串(例如2007.11 )转换为1..17范围内的整数,然后按季度(即将1..3个月合并到第一个bin中,即4)。 .6放入第二个垃圾箱等)。 (I don't see how 8 years (2007..2014) * 4 quarters = 32 bins reduces to only 17 bins, unless your data is sparse. But anyway...) (除非您的数据稀疏,否则我不知道8年(2007..2014)* 4个季度= 32个存储区如何减少到只有17个存储区。但是无论如何...)

No need for cumbersome ifelse ladders. 无需笨拙的梯子。

And for higher performance, use stringi library, stri_split_fixed() 为了获得更高的性能,请使用stringistri_split_fixed()

sample_wr <- function(...) sample(..., replace=T)

# Generate sample data (you're supposed to provide this to code, to make your issue reproducible)
set.seed(123)
N <- 20
df <- data.frame(yr.mo =
          paste(sample_wr(2007:2014, N), sample_wr(1:12, N), sep='.') )
# [1] "2009.11" "2013.9"  "2010.8"  "2014.12" "2014.8"  "2007.9"  "2011.7" 
# [8] "2014.8"  "2011.4"  "2010.2"  "2014.12" "2010.11" "2012.9"  "2011.10"
#[15] "2007.1"  "2014.6"  "2008.10" "2007.3"  "2009.4"  "2014.3" 

yearmonth_to_integer <- function(xx) {
    yy_mm <- as.integer(unlist(strsplit(xx, '.', fixed=T)))
    return( (yy_mm[1] - 2006) + (yy_mm[2] %/% 3) )
}

Cluster.GN <- sapply(x, yearmonth_to_integer)

# 2009.11  2013.9  2010.8 2014.12  2014.8  2007.9  2011.7 
#    6      10       6      12      10       4       7 
# 2014.8  2011.4  2010.2 2014.12 2010.11  2012.9 2011.10 
#   10       6       4      12       7       9       8 
# 2007.1  2014.6 2008.10  2007.3  2009.4  2014.3 
#    1      10       5       2       4       9 

and for higher performance, use dplyr or data.table library: 为了获得更高的性能,请使用dplyr或data.table库:

require(dplyr)

# something like the following, currently doesn't work,
# you have to handle two intermediate columns from yy_mm
# You get to fix this :)

df %>% mutate(yy_mm = as.integer(unlist(strsplit(yr.mo, '.', fixed=T))),
              quarter = yy_mm[1]-2006 + yy_mm[2] %/% 3 )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM