[英]how to express previous value as product of previous rows
I'm trying to initialize 3 columns of a data.table as a product of previous rows.我正在尝试将 data.table 的 3 列初始化为前几行的乘积。
i have a formula like this:我有一个这样的公式:
tableA[0,] = values
for each col:对于每个列:
tableA[i, col] = tableA[i - 1, col] * exp(tableB[i -1, col] * tableC[i, col]
(columns 1, 2 and 3 are completely independant) (第 1、2 和 3 列完全独立)
I feel like i should use cumprod
but i really don't see how (of course i can solve the problem with a for loop in 5sec)我觉得我应该使用
cumprod
但我真的不知道如何(当然我可以在 5 秒内用 for 循环解决问题)
Any one could please help me?任何人都可以帮助我吗?
Also bonus question, would there be a reference site with some examples translating various mathematical formula with sums or prods so that I could familiarize my self with not using for loops.还有一个额外的问题,是否有一个参考站点,其中包含一些示例,可以用总和或 prod 翻译各种数学公式,以便我可以熟悉不使用 for 循环的情况。
I have written a little example below.我在下面写了一个小例子。
For ex the expected values for tableA[, .(valA1)]
would be :例如,
tableA[, .(valA1)]
的预期值为:
1.2, 1.620, 3.605, 8.867, 19.734
assuming no mistake in excel假设excel没有错误
Thank you!谢谢!
tableA <- data.table( valA1 = rep(0, 5), valA2 = rep(0, 5), valA3 = rep(0, 5))
tableA[1, valA1 := 1.2]
tableA[1, valA2 := 1.1]
tableA[1, valA3 := 1.3]
tableB <- data.table(valB1 = c(0.1, 0.2, 0.3, 0.4, 0.5),
valB2 = c(0.2, 0.3, 0.4, 0.5, 0.6),
valB3 = c(0.3, 0.4, 0.5, 0.6, 0.7))
tableC <- data.table(valC1 = c(2, 3, 4, 3, 2),
valC2 = c(1, 2, 3, 1, 2),
valC3 = c(5, 3, 3, 2, 2))
tableA[, ValB1 := shift(tableB$valB1)]
tableA[, ValB2 := shift(tableB$valB2)]
tableA[, ValB3 := shift(tableB$valB3)]
tableA[, ValC1 := tableC$valC1]
tableA[, ValC2 := tableC$valC2]
tableA[, ValC3 := tableC$valC3]
tableA[, Exp1 := exp(ValB1 * ValC1)]
tableA[, Exp2 := exp(ValB2 * ValC2)]
tableA[, Exp3 := exp(ValB3 * ValC3)]
tableA[-1, ValA1 := cumprod something?]
You were nearly there.你快到了。
cumprod()
cumprod()
According to OP's description, the rule for computing the ValA
columns iteratively is根据OP的描述,迭代计算
ValA
列的规则是
a i = a i-1 * f i a i = a i-1 * f i
where f i are constants which are computed from the ValB
and ValC
columns其中 f i是从
ValB
和ValC
列计算的ValB
So, for i = 1 we get所以,对于 i = 1 我们得到
a 1 = a 0 * f 1 , a 1 = a 0 * f 1 ,
for i = 2, we get对于 i = 2,我们得到
a 2 = a 1 * f 2 = a 0 * f 1 * f 2 , a 2 = a 1 * f 2 = a 0 * f 1 * f 2 ,
and so forth.等等。 If this is being continued to i = n we get
如果这是继续 i = n 我们得到
a n = a 0 * f 1 * f 2 * ... * f n-1 * f n = a 0 * cumprod(f) a n = a 0 * f 1 * f 2 * ... * f n-1 * f n = a 0 * cumprod(f)
fill
parameter for shift()
is requiredshift()
fill
参数Now, we need to consider the value of b 0 .现在,我们需要考虑 b 0的值。 According to OP's formula
根据OP的公式
f n = e b i-1 c i f n = e b i-1 c i
For i = 1, this becomes对于 i = 1,这变成
f 1 = e b 0 c 1 f 1 = e b 0 c 1
so we need to define b 0 as 0 in the call to shift()所以我们需要在调用 shift() 时将 b 0定义为 0
shift(ValB, fill = 0)
so that f 1 = 1. Otherwise, f 1 would be NA
and cumprod(f) would become NA
as well.所以 f 1 = 1。否则,f 1将是
NA
并且 cumprod(f) 也将成为NA
。
Now, this can be wrapped up in a function definition:现在,这可以包含在函数定义中:
myfunc <- function(a, b, c) first(a) * cumprod(exp(shift(b, fill = 0) * c))
When called with the sample datsets当使用示例数据集调用时
myfunc(tableA$valA1, tableB$valB1, tableC$valC1)
it returns the expected result:它返回预期的结果:
[1] 1.200000 1.619831 3.604999 8.866867 19.733576
[1] 1.200000 1.619831 3.604999 8.866867 19.733576
The OP has pointed out that OP指出
columns 1, 2 and 3 are completely independent
第 1、2 和 3 列是完全独立的
Coding the same operations for a different set of columns is quite tedious.为一组不同的列编码相同的操作非常乏味。 Therefore, I suggest to reshape and combine the datasets.
因此,我建议重塑和组合数据集。
A <- melt(tableA, measure.vars = patterns("valA"), value.name = "a")
A[, b := melt(tableB, measure.vars = patterns("valB"))$value]
A[, c := melt(tableC, measure.vars = patterns("valC"))$value]
A[]
variable abc 1: valA1 1.2 0.1 2 2: valA1 0.0 0.2 3 3: valA1 0.0 0.3 4 4: valA1 0.0 0.4 3 5: valA1 0.0 0.5 2 6: valA2 1.1 0.2 1 7: valA2 0.0 0.3 2 8: valA2 0.0 0.4 3 9: valA2 0.0 0.5 1 10: valA2 0.0 0.6 2 11: valA3 1.3 0.3 5 12: valA3 0.0 0.4 3 13: valA3 0.0 0.5 3 14: valA3 0.0 0.6 2 15: valA3 0.0 0.7 2
Now, all input data are combined in one data.table in long format whereby each independent dataset is identified by the value of variable
.现在,所有输入数据都以长格式组合在一个 data.table 中,其中每个独立的数据集由
variable
的值标识。 myfunc()
can be applied on each group: myfunc()
可以应用于每个组:
A[, myfunc(a, b, c), by = variable]
variable V1 1: valA1 1.200000 2: valA1 1.619831 3: valA1 3.604999 4: valA1 8.866867 5: valA1 19.733576 6: valA2 1.100000 7: valA2 1.641007 8: valA2 4.036226 9: valA2 6.021342 10: valA2 16.367705 11: valA3 1.300000 12: valA3 3.197484 13: valA3 10.616021 14: valA3 28.857337 15: valA3 95.809732
This result can be reshaped to wide format again此结果可以再次重新整形为宽格式
dcast(A[, myfunc(a, b, c), by = variable], rowid(variable) ~ variable)[, variable := NULL][]
valA1 valA2 valA3 1: 1.200000 1.100000 1.300000 2: 1.619831 1.641007 3.197484 3: 3.604999 4.036226 10.616021 4: 8.866867 6.021342 28.857337 5: 19.733576 16.367705 95.809732
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.