简体   繁体   English

如何将先前的值表示为先前行的乘积

[英]how to express previous value as product of previous rows

I'm trying to initialize 3 columns of a data.table as a product of previous rows.我正在尝试将 data.table 的 3 列初始化为前几行的乘积。
i have a formula like this:我有一个这样的公式:

tableA[0,] = values  

for each col:对于每个列:

tableA[i, col] = tableA[i - 1, col] * exp(tableB[i -1, col] * tableC[i, col]

(columns 1, 2 and 3 are completely independant) (第 1、2 和 3 列完全独立)

I feel like i should use cumprod but i really don't see how (of course i can solve the problem with a for loop in 5sec)我觉得我应该使用cumprod但我真的不知道如何(当然我可以在 5 秒内用 for 循环解决问题)

Any one could please help me?任何人都可以帮助我吗?


Also bonus question, would there be a reference site with some examples translating various mathematical formula with sums or prods so that I could familiarize my self with not using for loops.还有一个额外的问题,是否有一个参考站点,其中包含一些示例,可以用总和或 prod 翻译各种数学公式,以便我可以熟悉不使用 for 循环的情况。

I have written a little example below.我在下面写了一个小例子。
For ex the expected values for tableA[, .(valA1)] would be :例如, tableA[, .(valA1)]的预期值为:

1.2, 1.620, 3.605, 8.867, 19.734  

assuming no mistake in excel假设excel没有错误

Thank you!谢谢!

tableA <- data.table( valA1 = rep(0, 5), valA2 = rep(0, 5), valA3 = rep(0, 5))
tableA[1, valA1 := 1.2]
tableA[1, valA2 := 1.1]
tableA[1, valA3 := 1.3]
    
tableB <- data.table(valB1 = c(0.1, 0.2, 0.3, 0.4, 0.5),
                     valB2 = c(0.2, 0.3, 0.4, 0.5, 0.6),
                     valB3 = c(0.3, 0.4, 0.5, 0.6, 0.7))
tableC <- data.table(valC1 = c(2, 3, 4, 3, 2),
                     valC2 = c(1, 2, 3, 1, 2),
                     valC3 = c(5, 3, 3, 2, 2))
tableA[, ValB1 := shift(tableB$valB1)]
tableA[, ValB2 := shift(tableB$valB2)]
tableA[, ValB3 := shift(tableB$valB3)]
    
tableA[, ValC1 := tableC$valC1]
tableA[, ValC2 := tableC$valC2]
tableA[, ValC3 := tableC$valC3]
    
tableA[, Exp1 := exp(ValB1 * ValC1)]
tableA[, Exp2 := exp(ValB2 * ValC2)]
tableA[, Exp3 := exp(ValB3 * ValC3)]
    
tableA[-1, ValA1 := cumprod something?]

You were nearly there.你快到了。

How to use cumprod()如何使用cumprod()

According to OP's description, the rule for computing the ValA columns iteratively is根据OP的描述,迭代计算ValA列的规则是

a i = a i-1 * f i a i = a i-1 * f i

where f i are constants which are computed from the ValB and ValC columns其中 f i是从ValBValC列计算的ValB

So, for i = 1 we get所以,对于 i = 1 我们得到

a 1 = a 0 * f 1 , a 1 = a 0 * f 1 ,

for i = 2, we get对于 i = 2,我们得到

a 2 = a 1 * f 2 = a 0 * f 1 * f 2 , a 2 = a 1 * f 2 = a 0 * f 1 * f 2

and so forth.等等。 If this is being continued to i = n we get如果这是继续 i = n 我们得到

a n = a 0 * f 1 * f 2 * ... * f n-1 * f n = a 0 * cumprod(f) a n = a 0 * f 1 * f 2 * ... * f n-1 * f n = a 0 * cumprod(f)

A fill parameter for shift() is required需要shift() fill参数

Now, we need to consider the value of b 0 .现在,我们需要考虑 b 0的值。 According to OP's formula根据OP的公式

f n = e b i-1 c i f n = e b i-1 c i

For i = 1, this becomes对于 i = 1,这变成

f 1 = e b 0 c 1 f 1 = e b 0 c 1

so we need to define b 0 as 0 in the call to shift()所以我们需要在调用 shift() 时将 b 0定义为 0

shift(ValB, fill = 0)

so that f 1 = 1. Otherwise, f 1 would be NA and cumprod(f) would become NA as well.所以 f 1 = 1。否则,f 1将是NA并且 cumprod(f) 也将成为NA

Define a function定义一个函数

Now, this can be wrapped up in a function definition:现在,这可以包含在函数定义中:

myfunc <- function(a, b, c) first(a) * cumprod(exp(shift(b, fill = 0) * c))

When called with the sample datsets当使用示例数据集调用时

myfunc(tableA$valA1, tableB$valB1, tableC$valC1)

it returns the expected result:它返回预期的结果:

[1] 1.200000 1.619831 3.604999 8.866867 19.733576 [1] 1.200000 1.619831 3.604999 8.866867 19.733576

Dealing with multiple independent columns处理多个独立的列

The OP has pointed out that OP指出

columns 1, 2 and 3 are completely independent第 1、2 和 3 列是完全独立的

Coding the same operations for a different set of columns is quite tedious.为一组不同的列编码相同的操作非常乏味。 Therefore, I suggest to reshape and combine the datasets.因此,我建议重塑和组合数据集。

A <- melt(tableA, measure.vars = patterns("valA"), value.name = "a")
A[, b := melt(tableB, measure.vars = patterns("valB"))$value]
A[, c := melt(tableC, measure.vars = patterns("valC"))$value]
A[]
 variable abc 1: valA1 1.2 0.1 2 2: valA1 0.0 0.2 3 3: valA1 0.0 0.3 4 4: valA1 0.0 0.4 3 5: valA1 0.0 0.5 2 6: valA2 1.1 0.2 1 7: valA2 0.0 0.3 2 8: valA2 0.0 0.4 3 9: valA2 0.0 0.5 1 10: valA2 0.0 0.6 2 11: valA3 1.3 0.3 5 12: valA3 0.0 0.4 3 13: valA3 0.0 0.5 3 14: valA3 0.0 0.6 2 15: valA3 0.0 0.7 2

Now, all input data are combined in one data.table in long format whereby each independent dataset is identified by the value of variable .现在,所有输入数据都以长格式组合在一个 data.table 中,其中每个独立的数据集由variable的值标识。 myfunc() can be applied on each group: myfunc()可以应用于每个组:

A[, myfunc(a, b, c), by = variable]
 variable V1 1: valA1 1.200000 2: valA1 1.619831 3: valA1 3.604999 4: valA1 8.866867 5: valA1 19.733576 6: valA2 1.100000 7: valA2 1.641007 8: valA2 4.036226 9: valA2 6.021342 10: valA2 16.367705 11: valA3 1.300000 12: valA3 3.197484 13: valA3 10.616021 14: valA3 28.857337 15: valA3 95.809732

This result can be reshaped to wide format again此结果可以再次重新整形为宽格式

dcast(A[, myfunc(a, b, c), by = variable], rowid(variable) ~ variable)[, variable := NULL][]
 valA1 valA2 valA3 1: 1.200000 1.100000 1.300000 2: 1.619831 1.641007 3.197484 3: 3.604999 4.036226 10.616021 4: 8.866867 6.021342 28.857337 5: 19.733576 16.367705 95.809732

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM