我们如何使用 R 中 data.table 中组中的最后一行进行一些计算？

Question

I have this data.table:我有这个数据表：

sample:样本：

id cond date
1  A1   2012-11-19
1  A1   2013-05-09
1  A2   2014-09-05
2  B1   2015-03-05
2  B1   2015-07-06
3  A1   2015-02-05
4  B1   2012-09-26
4  B1   2015-02-05
5  B1   2012-09-26

I want to calculate overdue days from today's date within each group of 'id' and 'cond', so I am trying to get the difference of days between the last date in each group and sys.date.我想计算每组“id”和“cond”中从今天日期开始的逾期天数，因此我试图获得每组中最后一个日期与 sys.date 之间的天数差。 Desired output is ;所需的输出是；

id cond date        overdue
1  A1   2012-11-19  NA
1  A1   2013-05-09  832 
1  A2   2014-09-05  348
2  B1   2015-03-05  NA 
2  B1   2015-07-06  44
3  A1   2015-02-05  195
4  B1   2012-09-26  NA 
4  B1   2015-02-05  195
5  B1   2012-09-26  1057

I tried to achieve this by following code:我试图通过以下代码实现这一点：

sample <- sample[ , overdue := Sys.Date() - date[.N], by = c('id','cond')]

But I am getting following output, where it the value are recycling:但是我得到以下输出，它的价值是回收的：

id cond date        overdue
1  A1   2012-11-19  832
1  A1   2013-05-09  832 
1  A2   2014-09-05  348
2  B1   2015-03-05  44 
2  B1   2015-07-06  44
3  A1   2015-02-05  195
4  B1   2012-09-26  195 
4  B1   2015-02-05  195
5  B1   2012-09-26  1057

I am not sure, how can I restrict my code to just do calculations for the last row and not recycle.我不确定，如何限制我的代码只对最后一行进行计算而不是回收。 I am sure there would be ways to do this, help is appreciated.我相信会有办法做到这一点，感谢帮助。

Answer 1

You could make a table of overdue values and the rows they belong in:您可以制作一张过期值及其所属行的表格：

bycols    = c("id","cond")
newcolDT2 = DT[, Sys.Date() - date[.N], by = bycols]

DT[newcolDT2, overdue := V1, on = bycols, mult = "last"]
#    id cond       date   overdue
# 1:  1   A1 2012-11-19   NA days
# 2:  1   A1 2013-05-09  832 days
# 3:  1   A2 2014-09-05  348 days
# 4:  2   B1 2015-03-05   NA days
# 5:  2   B1 2015-07-06   44 days
# 6:  3   A1 2015-02-05  195 days
# 7:  4   B1 2012-09-26   NA days
# 8:  4   B1 2015-02-05  195 days
# 9:  5   B1 2012-09-26 1057 days

This is the (arguably uglier) one-liner version:这是（可以说是丑陋的）单行版本：

DT[J(unique(DT[, ..bycols])), 
  overdue := Sys.Date() - date, on = bycols, mult = "last"]

Data:数据：

DT <- data.table(read.table(header=TRUE,text="id cond date
1  A1   2012-11-19
1  A1   2013-05-09
1  A2   2014-09-05
2  B1   2015-03-05
2  B1   2015-07-06
3  A1   2015-02-05
4  B1   2012-09-26
4  B1   2015-02-05
5  B1   2012-09-26"))[, date := as.IDate(date)]

# anyone know how to do this with fread()?

Answer 2

First, extract the rows you're interested in, then assign the values:首先，提取您感兴趣的行，然后分配值：

rows = DT[, .I[.N], by = .(id, cond)]$V1
DT[rows, overdue := Sys.Date() - date]

DT
#   id cond       date   overdue
#1:  1   A1 2012-11-19   NA days
#2:  1   A1 2013-05-09  832 days
#3:  1   A2 2014-09-05  348 days
#4:  2   B1 2015-03-05   NA days
#5:  2   B1 2015-07-06   44 days
#6:  3   A1 2015-02-05  195 days
#7:  4   B1 2012-09-26   NA days
#8:  4   B1 2015-02-05  195 days
#9:  5   B1 2012-09-26 1057 days

我们如何使用 R 中 data.table 中组中的最后一行进行一些计算？

问题描述

2 个解决方案

解决方案1
6 2015-08-19 17:11:42

解决方案2
4 2015-08-19 17:46:15

我们如何使用 R 中 data.table 中组中的最后一行进行一些计算？

问题描述

2 个解决方案

解决方案1 6 2015-08-19 17:11:42

解决方案2 4 2015-08-19 17:46:15

解决方案1
6 2015-08-19 17:11:42

解决方案2
4 2015-08-19 17:46:15