使用R中的数据帧的两个向量的元素操作

Question

My first question here: how to apply an efficient routine that iterates values of two vectors (pairwise) of a given data frame? 我在这里的第一个问题：如何应用一个有效的例程来迭代给定数据帧的两个向量（成对）的值？

To be more specific, consider the following example using the following data frame: 更具体地说，请使用以下数据框考虑以下示例：

df0 <- data.frame(matrix(c(1,2,2,3,1,3,0.4,0.2,0.2,0.1,0.4,0.1),nrow=6,ncol=2))
colnames(df0) <- c("value","frequency")

The first column is a real value and the second column is a frequency (or weights). 第一列是实数值，第二列是频率（或权重）。 NOTICE: the weights have to be strictly positive, they might be repeated, they not necessarily add up to one (because of repetition). 注意：权重必须严格为正，它们可能会重复，它们不一定加起来（因为重复）。

I am performing the following LOOP to calculate my function P. This P is supposed to be a number between 0 and 1. 我正在执行以下LOOP来计算我的函数P.这个P应该是0到1之间的数字。

# Define two parameters
K = 1/2
alpha = 0

# LOOP
mattemp <- matrix(,nrow=length(df0$value), ncol=length(df0$value))

for(i in 1:length(df0$value)) {
  for(j in 1:length(df0$value)) {

    mattemp[i,j] <- df0$frequency[i]^(1+alpha) * df0$frequency[j] * abs(df0$value[i]-df0$value[j])

    P <- K * sum(mattemp)
  }
}

Basically, my function P is calculating: 基本上，我的函数P正在计算：

P = K * (0.4^alpha * 0.2 * |1-2| + 0.4^alpha * 0.1 * |1-3| + ...

This code works perfectly well as long as the matrix is small. 只要矩阵很小，此代码就能很好地工作。

However, I am trying to implement this routine for a big matrix (5400 x 5400) and this LOOP does not seem to find an end. 但是，我正在尝试为大矩阵（5400 x 5400）实现此例程，并且此LOOP似乎没有找到结束。

I already tried to loop it using a foreach command (using %dopar% ), but it does not work as well. 我已经尝试使用foreach命令（使用%dopar% ）循环它，但它也不起作用。

Is there a smart and concise routine that R can handle??? 是否有一个聪明而简洁的例程，R可以处理??? It does not need to follow the above structure, as long as it is efficient. 只要它是有效的，它不需要遵循上述结构。

Thank you very much 非常感谢你

Answer 1

Try: 尝试：

df$nval <- (df0$value - mean(df0$value)) / sd(df0$value)
ij <- combn(nrow(df0), 2)
foo <- sum(df0$frequency[ij[1, ]] ^ (1 + alpha) * df0$frequency[ij[2, ]] * abs(df0$nval[ij[1, ]] - df0$nval[ij[2, ]]))
P <- K*2*sum(foo)

Reasoning : Basically you are testing every possible permutation between frequencies and normalized values. 推理：基本上，您正在测试频率和标准化值之间的所有可能的排列。 We use combn to create half of those. 我们使用combn来创建其中的一半。 We then just vectorize the whole thing. 然后，我们只是将整个事物矢量化。 Since combn only gives unique combinations, we need to multiply by 2. [Keep in mind that we don't need the values on the diagonal, as abs(df0$value[i] - df0$value[i]) is equal to 0 , and we are only missing cases where i=j and j=i , so that's why we multiply by 2.] We then multiply by K and get P. 由于combn只给出了唯一的组合，我们需要乘以2. [请记住，我们不需要对角线上的值，因为abs(df0$value[i] - df0$value[i])等于0 ，我们只缺少i=j和j=i ，这就是我们乘以2的原因。]然后我们乘以K得到P.

It's not clear how you want to normalize, so I just substracted the mean and divided that by the standard deviation. 目前尚不清楚你想要如何规范化，所以我只是减去均值并将其除以标准差。 If you meant something else, you yourself can change it accordingly. 如果你的意思是别的，你自己可以相应地改变它。

Edit1 : Big thanks to @alexis_laz for finding a mistake and suggesting improvements that almost double the speed! 编辑1 ：非常感谢@alexis_laz找到了一个错误并提出了几乎加倍速度的改进！

Edit2 : Adjusted script to fit changed requirements. Edit2 ：调整后的脚本以适应更改的要求。

使用R中的数据帧的两个向量的元素操作

问题描述

1 个解决方案

解决方案1
3 2016-02-01 20:31:51

使用R中的数据帧的两个向量的元素操作

问题描述

1 个解决方案

解决方案1 3 2016-02-01 20:31:51

解决方案1
3 2016-02-01 20:31:51