简体   繁体   English

获取data.table中前一个组的最后一行

[英]Get the last row of a previous group in data.table

This is what my data table looks like: 这就是我的数据表的样子:

library(data.table)
dt <- fread('
    Product  Group    LastProductOfPriorGroup
    A          1          NA
    B          1          NA
    C          2          B
    D          2          B
    E          2          B
    F          3          E
    G          3          E
')

The LastProductOfPriorGroup column is my desired column. LastProductOfPriorGroup列是我想要的列。 I am trying to fetch the product from last row of the prior group. 我正在尝试从前一组的最后一行获取产品。 So in the first two rows, there are no prior groups and therefore it is NA . 所以在前两行中,没有先前的组,因此它是NA In the third row, the product in the last row of the prior group 1 is B . 在第三行中,前一组1的最后一行中的乘积是B I am trying to accomplish this by 我试图通过这个来实现这个目标

dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]

to no avail. 无济于事。

You could do 你可以做到

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt , where newcol matches your desired column with the unnecessarily long name. 这会产生以下更新的dt ,其中newcol将您想要的列与不必要的长名称匹配。 ;) ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. 让我们从内到外打破代码。 I will use ... to denote the accumulated code: 我将使用...来表示累积的代码:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector. dt[, last(Product), by = Group]$V1将每组中的最后一个值作为字符向量。
  • shift(...) shifts the character vector in the previous call shift(...)移动前一次调用中的字符向量
  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing dt[, newcol := ...[.GRP], by = Group] .GRP dt[, newcol := ...[.GRP], by = Group]Group并使用内部.GRP值进行索引

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. 更新:弗兰克为我的代码提出了一个很好的观点,一遍又一遍地计算每个组的班次。 To avoid that, we can use either 为避免这种情况,我们可以使用其中之一

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. 这样我们就不计算每个组的班次。 Or, we can take Frank's nice suggestion in the comments and do the following. 或者,我们可以在评论中采纳弗兰克的好建议并执行以下操作。

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 

Another way is to save the last group's value in a variable. 另一种方法是将最后一个组的值保存在变量中。

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case). 注意: last()是一个data.table函数,它返回向量的最后一项(在本例中为Product列)。

This should also be fast since no logic is being invoked to fetch the last group's value; 这也应该很快,因为没有调用逻辑来获取最后一个组的值; it just relies on the groups running in order (which they do). 它只依赖于按顺序运行的组(他们这样做)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM