简体   繁体   English

请参考R中data.table中的上一行,并附带条件

[英]Refer to previous row in data.table in R, with a condition

i have a new problem with this data. 我对此数据有新问题。 Because my full data has the form like this 因为我的完整数据具有这样的形式

a=data.table(A=c(1:10),B=c(1,2,0,2,0,0,3,4,0,2),C=c(2,3,1,4,5,3,6,7,2,2),D=c(1,1,1,1,1,2,2,2,2,2)) 


#     A B C D  
# 1:  1 1 2 1  
# 2:  2 2 3 1  
# 3:  3 0 1 1  
# 4:  4 2 4 1  
# 5:  5 0 5 1  
# 6:  6 0 3 2  
# 7:  7 3 6 2  
# 8:  8 4 7 2  
# 9:  9 0 2 2  
#10: 10 2 2 2  

Now, I want to create a new column, which calculates the number of values of A multiple with B/C of the closet previous row, as long as B is not 0. For example, in line 2, I can calculate D=2*(1/2). 现在,我想创建一个新列,只要B不为0,它就会计算壁橱前一行的B / C的A倍数的值。例如,在第2行中,我可以计算D = 2 *(1/2)。 However, in line 4, it has to be 4*(2/3), it can not be 4*(0/1). 但是,在第4行中,它必须为4 *(2/3),不能为4 *(0/1)。 I use 我用

a[, D:= {i1 <- (NA^!B)
list( A*shift(na.locf(i1*B))/shift(na.locf(i1*C)))},by=d]

as Akrun recommended yesterday. 正如阿克伦昨天建议的那样。 It does not work when i calculate it by group.the result is like this 按组计算时不起作用。结果是这样的

    A B C d        D
# 1:  1 1 2 1       NA
# 2:  2 2 3 1 1.000000
# 3:  3 0 1 1 2.000000
# 4:  4 2 4 1 2.666667
# 5:  5 0 5 1 2.500000
# 6:  6 0 3 2       NA
# 7:  7 3 6 2 3.500000
# 8:  8 4 7 2 4.571429
# 9:  9 0 2 2 5.142857
# 10: 10 2 2 2       NA

Anyone knows what is the problem here? 有人知道这是什么问题吗? The error is longer object length is not a multiple of shorter object length. 错误是较长的对象长度不是较短的对象长度的倍数。

We can replace the elements in 'B', 'C' that corresponds to '0' value in 'B' as NA. 我们可以将“ B”,“ C”中与“ B”中的“ 0”值相对应的元素替换为NA。 Use na.locf from zoo to replace those NA values with the previous non-NA elements, shift the elements (by default, it gives a lag of 1), divide the modified columns 'B' with 'C' and then multiply by 'A'. 使用na.locfzoo来替换那些NA与先前的非NA元素的值, shift元件(默认,它给出了一个lag的1),除以修饰的列“B”与“C”,然后通过乘法'一种'。 Assign ( := ) the output to a new column 'D'. 将( := )输出分配给新列'D'。

 library(zoo)
 a[B==0, c('B', 'C'):=list(NA, NA)]
 a[, c('B', 'C'):= na.locf(.SD), .SDcols=B:C]
 a[,  D:= {tmp <- shift(.SD[, 2:3, with=FALSE])
           A*(tmp[[1]]/tmp[[2]])}]

Or we can make it compact. 或者我们可以使其紧凑。 We get a logical vector ( !B ) that checks for '0' elements in 'B', convert that to a vector of 1s and NA ( NA^ ), multiply with columns 'B' and 'C' so that the 1s are replaced by the corresponding elements in those columns whereas NA remains as such. 我们得到一个逻辑向量( !B ),该逻辑检查'B'中的'0'元素,将其转换为1s和NA( NA^ )的向量,并与列'B'和'C'相乘,从而使1s为用这些列中的相应元素替换,而NA保持原样。 Do the na.locf (as before), shift and then do the multiplication/division. 进行na.locf (如前所述),进行shift ,然后进行乘法/除法。

a[, D:= {i1 <- (NA^!B)
   list( A*shift(na.locf(i1*B))/shift(na.locf(i1*C)))}]

Or instead of calling shift/na.locf two times 或者,而不是两次调用shift/na.locf

a[,  D:= {i1 <- (NA^!B)
      tmp <- shift(na.locf(i1*.SD))
      a[['A']]*(tmp[[1]]/tmp[[2]])}, .SDcols=B:C]

This can be done with a rolling join: 这可以通过滚动联接来完成:

a[, row := .I]
a[, B/C, by=row][V1 != 0][a, A*shift(V1), on="row", roll=TRUE]
# [1]       NA 1.000000 2.000000 2.666667 2.500000 3.000000 3.500000 4.000000
# [9] 5.142857 5.714286

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM