[英]Aggregate by month-year and keep type as date
I want to create some summary statistics of a data.table
, aggregating by month and year of the date column. 我想创建一个
data.table
摘要统计信息,按日期列的月份和年份进行汇总。 Here's what I start with: 这是我的开始:
> head(monthly)
betnr persnr idnum frau gebjahr te_med month tentgelt status
1: 50536344 62181514 40442 1 1960 76.52142 1993-12-01 0.5777598 fire
2: 50536344 62744472 40442 0 1963 76.52142 1993-08-01 0.5777598 fire
3: 50536344 63071749 40442 0 1947 76.52142 1993-12-01 0.5777598 fire
4: 50536344 63385685 40442 1 1946 76.52142 1993-07-01 0.5777598 fire
5: 50536344 63918388 40442 0 1952 76.52142 1993-12-01 0.5777598 fire
6: 50536344 61961225 40442 0 1980 71.90094 1994-12-01 23.1001672 fire
To create my statistics, I then run 要创建我的统计信息,然后运行
statistics2 <- monthly[, list(NOBS = .N, MWAGE=mean(tentgelt)), by=list(status, month=format(month, '%m-%Y'))]
This creates the correct statistics, but the month
column now contains a string. 这将创建正确的统计信息,但是
month
列现在包含一个字符串。 I try to change the type to date by fixing the days to be 01
always: 我尝试通过将日期始终固定为
01
来更改日期类型:
x <-apply(statistics2, 1, function(x) paste('01-',x['month'], sep=''))
statistics2[, month:= as.Date(x, '%d-%m-%Y')]
Which gives me the desired output: 这给了我想要的输出:
> head(statistics2)
status month NOBS MWAGE
1: hire 1993-01-01 37914 0.5820961
2: normal 1993-01-01 790 0.5787695
3: hire 1994-01-01 6471 15.1267445
4: normal 1994-01-01 23931 22.8101928
5: hire 1993-02-01 435 0.5946736
6: normal 1993-02-01 38661 0.5820226
However, my whole approach feels a bit cloddy. 但是,我的整个方法有点落伍。 Is there a cleaner way of getting the desired output?
有没有更清洁的方法来获得所需的输出?
Yes, you can make it simpler and do everything in one go. 是的,您可以使其更简单,并且一劳永逸。 Just do the whole conversion to
Date
class in the aggregation process 只需在聚合过程中将整个转换转换为
Date
类
statistics2 <- monthly[, list(NOBS = .N,
MWAGE = mean(tentgelt)),
by = list(status, month = as.Date(format(month, '%Y-%m-01')))]
statistics2
# status month NOBS MWAGE
# 1: fire 1993-12-01 3 0.5777598
# 2: fire 1993-08-01 1 0.5777598
# 3: fire 1993-07-01 1 0.5777598
# 4: fire 1994-12-01 1 23.1001672
Some side notes: 一些注意事项:
Your apply
method is not how you should have done this with data.table
. 您的
apply
方法不是应如何使用data.table
完成此data.table
。 You could accomplish your last step simply by doing: 您只需完成以下操作即可完成最后一步:
statistics2[, month := as.Date(paste0("01-", month), "%d-%m-%Y")]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.