简体   繁体   English

按月-年汇总并将类型保留为日期

[英]Aggregate by month-year and keep type as date

I want to create some summary statistics of a data.table , aggregating by month and year of the date column. 我想创建一个data.table摘要统计信息,按日期列的月份和年份进行汇总。 Here's what I start with: 这是我的开始:

> head(monthly)
      betnr   persnr idnum frau gebjahr   te_med      month   tentgelt status
1: 50536344 62181514 40442    1    1960 76.52142 1993-12-01  0.5777598   fire
2: 50536344 62744472 40442    0    1963 76.52142 1993-08-01  0.5777598   fire
3: 50536344 63071749 40442    0    1947 76.52142 1993-12-01  0.5777598   fire
4: 50536344 63385685 40442    1    1946 76.52142 1993-07-01  0.5777598   fire
5: 50536344 63918388 40442    0    1952 76.52142 1993-12-01  0.5777598   fire
6: 50536344 61961225 40442    0    1980 71.90094 1994-12-01 23.1001672   fire

To create my statistics, I then run 要创建我的统计信息,然后运行

statistics2 <- monthly[, list(NOBS = .N, MWAGE=mean(tentgelt)), by=list(status, month=format(month, '%m-%Y'))]

This creates the correct statistics, but the month column now contains a string. 这将创建正确的统计信息,但是month列现在包含一个字符串。 I try to change the type to date by fixing the days to be 01 always: 我尝试通过将日期始终固定为01来更改日期类型:

x <-apply(statistics2, 1, function(x) paste('01-',x['month'], sep=''))
statistics2[, month:= as.Date(x, '%d-%m-%Y')]

Which gives me the desired output: 这给了我想要的输出:

> head(statistics2)
   status      month  NOBS      MWAGE
1:   hire 1993-01-01 37914  0.5820961
2: normal 1993-01-01   790  0.5787695
3:   hire 1994-01-01  6471 15.1267445
4: normal 1994-01-01 23931 22.8101928
5:   hire 1993-02-01   435  0.5946736
6: normal 1993-02-01 38661  0.5820226

However, my whole approach feels a bit cloddy. 但是,我的整个方法有点落伍。 Is there a cleaner way of getting the desired output? 有没有更清洁的方法来获得所需的输出?

Yes, you can make it simpler and do everything in one go. 是的,您可以使其更简单,并且一劳永逸。 Just do the whole conversion to Date class in the aggregation process 只需在聚合过程中将整个转换转换为Date

statistics2 <- monthly[, list(NOBS = .N, 
                         MWAGE = mean(tentgelt)), 
                         by = list(status, month = as.Date(format(month, '%Y-%m-01')))]
statistics2
#    status      month NOBS      MWAGE
# 1:   fire 1993-12-01    3  0.5777598
# 2:   fire 1993-08-01    1  0.5777598
# 3:   fire 1993-07-01    1  0.5777598
# 4:   fire 1994-12-01    1 23.1001672

Some side notes: 一些注意事项:

  • As @beginner mentioned in his comment, there is no "Year-Month" date type in R, see this r-faq 正如@beginner在评论中提到的那样,R中没有“ Year-Month”日期类型,请参阅此r-faq
  • Your apply method is not how you should have done this with data.table . 您的apply方法不是应如何使用data.table完成此data.table You could accomplish your last step simply by doing: 您只需完成以下操作即可完成最后一步:

     statistics2[, month := as.Date(paste0("01-", month), "%d-%m-%Y")] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM