简体   繁体   中英

Aggregate by month-year and keep type as date

I want to create some summary statistics of a data.table , aggregating by month and year of the date column. Here's what I start with:

> head(monthly)
      betnr   persnr idnum frau gebjahr   te_med      month   tentgelt status
1: 50536344 62181514 40442    1    1960 76.52142 1993-12-01  0.5777598   fire
2: 50536344 62744472 40442    0    1963 76.52142 1993-08-01  0.5777598   fire
3: 50536344 63071749 40442    0    1947 76.52142 1993-12-01  0.5777598   fire
4: 50536344 63385685 40442    1    1946 76.52142 1993-07-01  0.5777598   fire
5: 50536344 63918388 40442    0    1952 76.52142 1993-12-01  0.5777598   fire
6: 50536344 61961225 40442    0    1980 71.90094 1994-12-01 23.1001672   fire

To create my statistics, I then run

statistics2 <- monthly[, list(NOBS = .N, MWAGE=mean(tentgelt)), by=list(status, month=format(month, '%m-%Y'))]

This creates the correct statistics, but the month column now contains a string. I try to change the type to date by fixing the days to be 01 always:

x <-apply(statistics2, 1, function(x) paste('01-',x['month'], sep=''))
statistics2[, month:= as.Date(x, '%d-%m-%Y')]

Which gives me the desired output:

> head(statistics2)
   status      month  NOBS      MWAGE
1:   hire 1993-01-01 37914  0.5820961
2: normal 1993-01-01   790  0.5787695
3:   hire 1994-01-01  6471 15.1267445
4: normal 1994-01-01 23931 22.8101928
5:   hire 1993-02-01   435  0.5946736
6: normal 1993-02-01 38661  0.5820226

However, my whole approach feels a bit cloddy. Is there a cleaner way of getting the desired output?

Yes, you can make it simpler and do everything in one go. Just do the whole conversion to Date class in the aggregation process

statistics2 <- monthly[, list(NOBS = .N, 
                         MWAGE = mean(tentgelt)), 
                         by = list(status, month = as.Date(format(month, '%Y-%m-01')))]
statistics2
#    status      month NOBS      MWAGE
# 1:   fire 1993-12-01    3  0.5777598
# 2:   fire 1993-08-01    1  0.5777598
# 3:   fire 1993-07-01    1  0.5777598
# 4:   fire 1994-12-01    1 23.1001672

Some side notes:

  • As @beginner mentioned in his comment, there is no "Year-Month" date type in R, see this r-faq
  • Your apply method is not how you should have done this with data.table . You could accomplish your last step simply by doing:

     statistics2[, month := as.Date(paste0("01-", month), "%d-%m-%Y")] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM