[英]Element-wise operation with two vectors of a data frame in R
My first question here: how to apply an efficient routine that iterates values of two vectors (pairwise) of a given data frame? 我在这里的第一个问题:如何应用一个有效的例程来迭代给定数据帧的两个向量(成对)的值?
To be more specific, consider the following example using the following data frame: 更具体地说,请使用以下数据框考虑以下示例:
df0 <- data.frame(matrix(c(1,2,2,3,1,3,0.4,0.2,0.2,0.1,0.4,0.1),nrow=6,ncol=2))
colnames(df0) <- c("value","frequency")
The first column is a real value and the second column is a frequency (or weights). 第一列是实数值,第二列是频率(或权重)。 NOTICE: the weights have to be strictly positive, they might be repeated, they not necessarily add up to one (because of repetition).
注意:权重必须严格为正,它们可能会重复,它们不一定加起来(因为重复)。
I am performing the following LOOP to calculate my function P. This P is supposed to be a number between 0 and 1. 我正在执行以下LOOP来计算我的函数P.这个P应该是0到1之间的数字。
# Define two parameters
K = 1/2
alpha = 0
# LOOP
mattemp <- matrix(,nrow=length(df0$value), ncol=length(df0$value))
for(i in 1:length(df0$value)) {
for(j in 1:length(df0$value)) {
mattemp[i,j] <- df0$frequency[i]^(1+alpha) * df0$frequency[j] * abs(df0$value[i]-df0$value[j])
P <- K * sum(mattemp)
}
}
Basically, my function P is calculating: 基本上,我的函数P正在计算:
P = K * (0.4^alpha * 0.2 * |1-2| + 0.4^alpha * 0.1 * |1-3| + ...
This code works perfectly well as long as the matrix is small. 只要矩阵很小,此代码就能很好地工作。
However, I am trying to implement this routine for a big matrix (5400 x 5400) and this LOOP does not seem to find an end. 但是,我正在尝试为大矩阵(5400 x 5400)实现此例程,并且此LOOP似乎没有找到结束。
I already tried to loop it using a foreach
command (using %dopar%
), but it does not work as well. 我已经尝试使用
foreach
命令(使用%dopar%
)循环它,但它也不起作用。
Is there a smart and concise routine that R can handle??? 是否有一个聪明而简洁的例程,R可以处理??? It does not need to follow the above structure, as long as it is efficient.
只要它是有效的,它不需要遵循上述结构。
Thank you very much 非常感谢你
Try: 尝试:
df$nval <- (df0$value - mean(df0$value)) / sd(df0$value)
ij <- combn(nrow(df0), 2)
foo <- sum(df0$frequency[ij[1, ]] ^ (1 + alpha) * df0$frequency[ij[2, ]] * abs(df0$nval[ij[1, ]] - df0$nval[ij[2, ]]))
P <- K*2*sum(foo)
Reasoning : Basically you are testing every possible permutation between frequencies and normalized values. 推理 :基本上,您正在测试频率和标准化值之间的所有可能的排列。 We use
combn
to create half of those. 我们使用
combn
来创建其中的一半。 We then just vectorize the whole thing. 然后,我们只是将整个事物矢量化。 Since
combn
only gives unique combinations, we need to multiply by 2. [Keep in mind that we don't need the values on the diagonal, as abs(df0$value[i] - df0$value[i])
is equal to 0
, and we are only missing cases where i=j
and j=i
, so that's why we multiply by 2.] We then multiply by K
and get P. 由于
combn
只给出了唯一的组合,我们需要乘以2. [请记住,我们不需要对角线上的值,因为abs(df0$value[i] - df0$value[i])
等于0
,我们只缺少i=j
和j=i
,这就是我们乘以2的原因。]然后我们乘以K
得到P.
It's not clear how you want to normalize, so I just substracted the mean and divided that by the standard deviation. 目前尚不清楚你想要如何规范化,所以我只是减去均值并将其除以标准差。 If you meant something else, you yourself can change it accordingly.
如果你的意思是别的,你自己可以相应地改变它。
Edit1 : Big thanks to @alexis_laz for finding a mistake and suggesting improvements that almost double the speed! 编辑1 :非常感谢@alexis_laz找到了一个错误并提出了几乎加倍速度的改进!
Edit2 : Adjusted script to fit changed requirements. Edit2 :调整后的脚本以适应更改的要求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.