简体   繁体   English

R data.table值取决于前一行

[英]R data.table values depending on previous row

I am trying to solve the problem proposed here . 我正在努力解决这里提出的问题。 Basically what I need is, for each row of the data.table, take the values of each variable in the previous row and use them to define the variable values in the following row. 基本上我需要的是,对于data.table的每一行,取上一行中每个变量的值,并使用它们来定义下一行中的变量值。

I have tried using data.table , but the result is quite bulky and I believe extremely inefficient (especially for a big number of rows). 我曾尝试使用data.table ,但结果非常笨重,我认为效率极低(特别是对于大量的行)。 I also tried using the shift() function, but could not fit it in my temporary solution. 我也尝试使用shift()函数,但无法在我的临时解决方案中使用它。

Here is a toy example: 这是一个玩具示例:

library(data.table)
DT = data.table(a = numeric(10L), b = numeric(10L), c = numeric(10L), iter = 1:10)

for(i in DT[,.I]){

  DT[i, c('a','b','c') := {
    if(iter == 1) {
      a = 1
      b = 2
      c = 3
    } else { # if it is not the first iteration
      a = DT[i-1, a + b] # read the values from the previous row to compute the new values
      b = DT[i-1, b] - a
      c = a / b + DT[i-1, c]
    }

    .(a, b, c)
  }]

}

and here's the output: 这是输出:

     a  b          c iter
 1:  1  2  3.0000000    1
 2:  3 -1  0.0000000    2
 3:  2 -3 -0.6666667    3
 4: -1 -2 -0.1666667    4
 5: -3  1 -3.1666667    5
 6: -2  3 -3.8333333    6
 7:  1  2 -3.3333333    7
 8:  3 -1 -6.3333333    8
 9:  2 -3 -7.0000000    9
10: -1 -2 -6.5000000   10

Can someone help me improve the code? 有人可以帮我改进代码吗?

Note : This is not a general answer to the OP's problem, just to the toy example posted. 注意 :这不是OP问题的一般答案,仅仅是发布的玩具示例。

Your iterations for a and b are on a cycle every six iterations, and c is a cumulative sum. 您对a和b的迭代每六次迭代进行一次循环,c是累积和。 As a result, it does not have to be computed iteratively, but has a closed form solution for any iteration #: 因此,它不必迭代计算,但对于任何迭代都有一个封闭的形式解决方案#:

f = function(i, a0 = 1, b0 = 2, c0 = 2.5){
  trio = c(a0, a0+b0, b0)
  a = c(trio, -trio)
  b = -c(tail(a, 1L), head(a, -1L))

  cs = cumsum(a/b)
  c6 = tail(cs, 1L)

  k = (i - 1L) %/% 6L
  ii = 1L + (i - 1L) %% 6L

  list(a = a[ii], b = b[ii], c = c0 + k*c6 + cs[ii])
}

library(data.table)
DT = data.table(iter = 1:10)[, c("a", "b", "c") := f(iter)][]

    iter  a  b          c
 1:    1  1  2  3.0000000
 2:    2  3 -1  0.0000000
 3:    3  2 -3 -0.6666667
 4:    4 -1 -2 -0.1666667
 5:    5 -3  1 -3.1666667
 6:    6 -2  3 -3.8333333
 7:    7  1  2 -3.3333333
 8:    8  3 -1 -6.3333333
 9:    9  2 -3 -7.0000000
10:   10 -1 -2 -6.5000000

That is, you can just skip ahead to any iteration: 也就是说,您可以跳到任何迭代:

> setDT(f(10))[]
    a  b    c
1: -1 -2 -6.5
> setDT(f(100))[]
    a  b      c
1: -1 -2 -101.5

You can use Reduce with acumulate = T 你可以使用Reduce with acumulate = T

fun <- function(x, junk){
 x[1] <- sum(x[1:2])
 x[2] <- diff(x[1:2])
 x[3] <- x[1]/x[2] + x[3]
 x
}

dt <- 
  as.data.table(do.call(rbind, Reduce(fun, numeric(9L), accumulate = T, init = 1:3)))

setnames(dt, c('a', 'b', 'c'))

dt
#      a  b          c
#  1:  1  2  3.0000000
#  2:  3 -1  0.0000000
#  3:  2 -3 -0.6666667
#  4: -1 -2 -0.1666667
#  5: -3  1 -3.1666667
#  6: -2  3 -3.8333333
#  7:  1  2 -3.3333333
#  8:  3 -1 -6.3333333
#  9:  2 -3 -7.0000000
# 10: -1 -2 -6.5000000

You can use transpose instead of do.call(rbind, as below, but if you have tidyverse or purrr loaded, make sure transpose is data.table::transpose 你可以使用transpose而不是do.call(rbind,如下所示,但如果你有tidyversepurrr加载,请确保transposedata.table::transpose

dt <- 
  as.data.table(transpose(Reduce(fun, numeric(9L), accumulate = T, init = 1:3)))

Explanation for junk : junk解释:

Each iteration, Reduce passes the previous output (or init ) as well as the i-th element of its x argument, to f . 每次迭代时, Reduce将先前的输出(或init )以及其x参数的第i个元素传递给f So even if you're not going to use the x argument of Reduce in your function f you still need to have an argument for it. 所以,即使你不打算使用x的参数Reduce在你的函数f你仍然需要有它的理由。 If you don't add this extra "junk" argument, you get an "unused argument" error when it runs because it tries to add the extra argument to f , but f only has one argument. 如果你不添加这个额外的“垃圾”参数,你运行时会得到一个“未使用的参数”错误,因为它试图将额外的参数添加到f ,但f只有一个参数。

Another option: 另外一个选项:

cols <- c('a','b','c')
A <- 1; B <- 2; C <- 3
DT[iter==1, (cols) := .(A, B, C)]
DT[iter>1, 
    (cols) := {
        A = A + B
        B = B - A
        C = A / B + C
        .(A, B, C)
    },
    by=iter]

In fact you can solve your problem by using a recursive function call where you propagate your values from function call to function call and don't need to use the values of the previous row. 实际上,您可以通过使用递归函数调用来解决您的问题,其中您将值从函数调用传播到函数调用,而不需要使用前一行的值。 In base you could do it like: 基地你可以这样做:

DT = data.frame(a = numeric(10L), b = numeric(10L), c = numeric(10L), iter = 1:10)

fun <- function(a, b, c, n) {
  a <- a + b
  b <- b - a
  c <- a/b + c
  n <- n - 1
  if(n<=0) {return(c(a,b,c))}
  return(rbind(c(a,b,c),fun(a,b,c,n)))
}

DT[1,1:3] <- 1:3
DT[-1,1:3] <- fun(DT[1,1], DT[1,2], DT[1,3], 9)
DT

    a  b          c iter
1   1  2  3.0000000    1
2   3 -1  0.0000000    2
3   2 -3 -0.6666667    3
4  -1 -2 -0.1666667    4
5  -3  1 -3.1666667    5
6  -2  3 -3.8333333    6
7   1  2 -3.3333333    7
8   3 -1 -6.3333333    8
9   2 -3 -7.0000000    9
10 -1 -2 -6.5000000   10

Alternatively you can simply make a for loop : 或者你可以简单地做一个for loop

DT = data.frame(a = numeric(10L), b = numeric(10L), c = numeric(10L), iter = 1:10)
a <- 1
b <- 2
c <- 3
for(i in seq_len(nrow(DT))) {
  DT[i,1:3] <- c(a,b,c)
  a <- a + b
  b <- b - a
  c <- a/b + c
}
DT

    a  b          c iter
1   1  2  3.0000000    1
2   3 -1  0.0000000    2
3   2 -3 -0.6666667    3
4  -1 -2 -0.1666667    4
5  -3  1 -3.1666667    5
6  -2  3 -3.8333333    6
7   1  2 -3.3333333    7
8   3 -1 -6.3333333    8
9   2 -3 -7.0000000    9
10 -1 -2 -6.5000000   10

But this will also be slow. 但这也会很慢。 A fast solution is given eg by IceCreamToucan . 例如, IceCreamToucan给出了快速解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM