简体   繁体   English

在data.frame中将月份`factor`分类到时间段

[英]categorize month `factor` to time periods in data.frame

update (initial question below) 更新(下面的初始问题)

Thanks to @akrun and @ulfelder I realized my initial example wasn't complex enough as I only had one year. 感谢@akrun@ulfelder,我意识到我最初的例子不够复杂,因为我只有一年。 Data covering several years might be more realistic, and more useful for others. 涵盖数年的数据可能更现实,对其他人更有用。

Say instead my data was, 假设我的数据是

df <- structure(list(yr_month = structure(1:7, .Label = c("2014-1", "2014-2", 
"2014-3", "2015-4", "2016-4", "2016-6", "2017-7"), class = "factor"), 
    a = c(4.14, 2.83, 3.71, 4.15, 4.63, 4.91, 5.31), b = c(4.25, 
    3.5, 3.5, 3.5, 3.5, 3.5, 5)), .Names = c("yrQ", "a", "b"
), row.names = c(NA, 7L), class = "data.frame")
df
#      yrQ    a    b
# 1 2014-1 4.14 4.25
# 2 2014-2 2.83 3.50
# 3 2014-3 3.71 3.50
# 4 2015-4 4.15 3.50
# 5 2016-4 4.63 3.50
# 6 2016-6 4.91 3.50
# 7 2017-7 5.31 5.00

and I wanted to crate a category covering before Mar 2014, 2014-3, between 2014-3 and 2016-4, and after 2016-4. 我想创建一个涵盖2014年3月之前,2014-3年之前,2014-3年至2016-4年之间以及2016-4年之后的类别。 so that I got something like this, 这样我就得到了这样的东西

#   yr.cat    yrQ    a    b
# 1    "A" 2014-1 4.14 4.25
# 2    "A" 2014-2 2.83 3.50
# 3    "B" 2014-3 3.71 3.50
# 4    "B" 2015-4 4.15 3.50
# 5    "B" 2016-4 4.63 3.50
# 6    "C" 2016-6 4.91 3.50
# 7    "C" 2017-7 5.31 5.00

Initial question 最初的问题

Say I have a data set like this, 说我有一个这样的数据集,

df <- structure(list(yr_month = structure(1:7, .Label = c("2016-1", "2016-2", 
"2016-3", "2016-4", "2016-5", "2016-6", "2016-7"), class = "factor"), 
    a = c(4.14, 2.83, 3.71, 4.15, 4.63, 4.91, 5.31), b = c(4.25, 
    3.5, 3.5, 3.5, 3.5, 3.5, 5)), .Names = c("yrQ", "a", "b"
), row.names = c(NA, 7L), class = "data.frame")
df
#      yrQ    a    b
# 1 2016-1 4.14 4.25
# 2 2016-2 2.83 3.50
# 3 2016-3 3.71 3.50
# 4 2016-4 4.15 3.50
# 5 2016-5 4.63 3.50
# 6 2016-6 4.91 3.50
# 7 2016-7 5.31 5.00

Now, I can use ifelse() to categorize a numeric variable. 现在,我可以使用ifelse()来分类a数字变量。 Like this, 像这样,

df$a.cat <- ifelse(df$a < 3.8, c("tiny"), ifelse(df$a < 4.8, c("medium"), c("huge")) )
df
#      yrQ    a    b  a.cat
# 1 2016-1 4.14 4.25 medium
# 2 2016-2 2.83 3.50   tiny
# 3 2016-3 3.71 3.50   tiny
# 4 2016-4 4.15 3.50 medium
# 5 2016-5 4.63 3.50 medium
# 6 2016-6 4.91 3.50   huge
# 7 2016-7 5.31 5.00   huge

but, what if I want to crate a variable signifying some time periods. 但是,如果我要创建一个表示某些时间段的变量该怎么办。 Say before Mar 2016, 2016-3 , between 2016-3 and 2016-5 , and after 2016-5 . 2016年3月,之前说2016-3之间2016-32016-5 ,经过2016-5 I realize I can transform the data to ts and then use window() to cut it up and then put it back together, but isn't there a smarter way to get to something like this using if else on yrQ ? 我意识到我可以将数据转换为ts ,然后使用window()进行分割,然后再放回去,但是在yrQ上使用if else是否不是更明智的方式呢?

It's something like this I want to get to, 我想要达到这样的目标

  yr.cat    yrQ    a    b
1    "A" 2016-1 4.14 4.25
2    "A" 2016-2 2.83 3.50
3    "B" 2016-3 3.71 3.50
4    "B" 2016-4 4.15 3.50
5    "B" 2016-5 4.63 3.50
6    "C" 2016-6 4.91 3.50
7    "C" 2016-7 5.31 5.00

We can use cut after extracting the month substring from the 'yrQ' 从“ yrQ”中提取月份子串后,我们可以使用cut

df$yr.cat <- cut(as.numeric(sub(".*-", "", df$yrQ)), 
               breaks = c(-Inf,2, 5, Inf), labels = LETTERS[1:3])
df$yr.cat
#[1] A A B B B C C
#Levels: A B C

Based on the updated example 根据更新的示例

cut(as.numeric(sub("-", ".", df$yrQ)),
       breaks = c(-Inf, 2014.2, 2016.5, Inf), labels = LETTERS[1:3])
#[1] A A B B B C C
#Levels: A B C

The input data provided in the question seems inconsistent referring to the same column as yrQ and yr_month at different points in the data structure. 问题中提供的输入数据在数据结构中的不同点yrQyr_monthyr_month相同的列时似乎不一致。 We have assumed this input instead which is the same except that we replaced yrQ in .Names (which is suggestive of year/qtr rather than year/month) with yr_month for consistency with the same name shown in list() . 我们假设此输入是相同的,除了我们用yrQ替换了yrQ中的.Names (暗示year / qtr而不是year / month),以yr_monthlist()显示的相同名称一致。

df <- structure(list(yr_month = structure(1:7, .Label = c("2014-1", "2014-2", 
"2014-3", "2015-4", "2016-4", "2016-6", "2017-7"), class = "factor"), 
    a = c(4.14, 2.83, 3.71, 4.15, 4.63, 4.91, 5.31), b = c(4.25, 
    3.5, 3.5, 3.5, 3.5, 3.5, 5)), .Names = c("yr_month", "a", "b"
), row.names = c(NA, 7L), class = "data.frame")

The example data in the question only has one digit months but we assume it needs to work even if there are a mix of 1 digit (Jan, Feb, ..., Sep) and 2 digit (Oct, Nov, Dec) months. 问题中的示例数据只有一个数字月份,但我们假设即使有1个数字(1月,2月,...,9月)和2个数字(10月,11月,12月)的混合,它也需要工作。

1) Convert to "yearmon" class (which may also help if we need to do other things with this column) and perform a comparison to each cut point and add them giving a number 0, 1 or 2 representing before, between and after respectively. 1)转换为"yearmon"类(如果我们需要对本栏做其他事情,这也可能会有所帮助),并对每个切点进行比较,并将它们相加,分别给出一个数字0、1或2,分别表示之前,之后和之后。 Then add 1 and use that as a subscript to a vector of the category names (here LETTERS ). 然后加1并将其用作下标到类别名称的向量(此处为LETTERS )。 This could be extended to more categories by just adding more comparison terms. 只需添加更多比较项,就可以将其扩展到更多类别。

library(zoo)

df$yr_month <- as.yearmon(df$yr_month) ##
transform(df, yr.cat = LETTERS[ (yr_month >= "2014-03") + (yr_month > "2016-04") + 1])

giving: 赠送:

  yr_month    a    b yr.cat
1 Jan 2014 4.14 4.25      A
2 Feb 2014 2.83 3.50      A
3 Mar 2014 3.71 3.50      B
4 Apr 2015 4.15 3.50      B
5 Apr 2016 4.63 3.50      B
6 Jun 2016 4.91 3.50      C
7 Jul 2017 5.31 5.00      C

2) To do it without any packages change the line marked ## in (1) to the line of code below. 2)要在没有任何程序包的情况下将(1)中标记为##的行更改为以下代码行。 Here we convert yr_month to "Date" class and then remove the day part of its character representation. 在这里,我们将yr_month转换为"Date"类,然后删除其字符表示形式的day部分。 This leaves 2 digits for the month so that comparisons between 1 and 2 digt months work properly. 这样,该月剩下2位数字,这样1到2个数字月之间的比较就可以正常进行。 (In (1) "yearmon" class handles that automatically.) (在(1)中, "yearmon"类自动处理该问题。)

df$yr_month <- sub("...$", "", as.Date(paste0(df$yr_month, -1)))

Revised Have made a number of revisions. 修订已进行了许多修订。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM