简体   繁体   English

根据时间段创建组

[英]Create groups based on time period

How can I create a new grouping variable for my data based on 5-year steps? 如何根据5年步骤为我的数据创建新的分组变量?

So from this: 所以从这个:

group <- c(rep("A", 7), rep("B", 10))
year <- c(2008:2014, 2005:2014)
dat <- data.frame(group, year)

   group year
1      A 2008
2      A 2009
3      A 2010
4      A 2011
5      A 2012
6      A 2013
7      A 2014
8      B 2005
9      B 2006
10     B 2007
11     B 2008
12     B 2009
13     B 2010
14     B 2011
15     B 2012
16     B 2013
17     B 2014

To this: 对此:

 > dat
   group year    period
1      A 2008 2005_2009
2      A 2009 2005_2009
3      A 2010 2010_2014
4      A 2011 2010_2014
5      A 2012 2010_2014
6      A 2013 2010_2014
7      A 2014 2010_2014
8      B 2005 2005_2009
9      B 2006 2005_2009
10     B 2007 2005_2009
11     B 2008 2005_2009
12     B 2009 2005_2009
13     B 2010 2010_2014
14     B 2011 2010_2014
15     B 2012 2010_2014
16     B 2013 2010_2014
17     B 2014 2010_2014

I guess I could use cut(dat$year, breaks = ??) but I don't know how to set the breaks. 我想我可以使用cut(dat$year, breaks = ??)但我不知道如何设置休息时间。

Here is one way of doing it: 这是一种方法:

dat$period <- paste(min <- floor(dat$year/5)*5, min+4,sep = "_")

I guess the trick here is to get the biggest whole number smaller than your year with the floor(year/x)*x function. 我想这里的诀窍是让最大的整数小于你的年份floor(year/x)*x函数。


Here is a version that should work generally: 这是一个通常应该工作的版本:

x <- 5
yearstart <- 2000
dat$period <- paste(min <- floor((dat$year-yearstart)/x)*x+yearstart,
                    min+x-1,sep = "_")

You can use yearstart to ensure eg year 2000 is the first in a group for when x is not a multiple of it. 您可以使用yearstart来确保例如,当x不是它的倍数时,2000年是组中的第一个。

cut should do the job if you create actual Date objects from your 'year' column. 如果从“年份”列创建实际的Date对象, cut应该可以完成工作。

## convert 'year' column to dates
yrs <- paste0(dat$year, "-01-01")
yrs <- as.Date(yrs)

## create cuts of 5 years and add them to data.frame
dat$period <- cut(yrs, "5 years")

## create desired factor levels
library(lubridate)

lvl <- as.Date(levels(dat$period))
lvl <- paste(year(lvl), year(lvl) + 4, sep = "_")
levels(dat$period) <- lvl

head(dat)
  group year    period
1     A 2008 2005_2009
2     A 2009 2005_2009
3     A 2010 2010_2014
4     A 2011 2010_2014
5     A 2012 2010_2014
6     A 2013 2010_2014

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM