简体   繁体   English

如何在矩阵的行上复制相同的函数

[英]How to replicate the same function over the rows in a matrix

I am trying to write a loop that determines which cell has the greatest value and select that cell as a result with a high medium or low string. 我正在尝试编写一个循环,该循环确定哪个单元格具有最大的值,并选择具有高中或低字符串的结果作为该单元格。 Here is the data for try out. 这是要试用的数据。

data <- matrix(c(0.3000003,0.3299896,0.3700101,
                 0.3299896,0.3700101,0.3000003,
                 0.3700101,0.3000003,0.3299896,
                 0.3000003,0.3299896,0.3700101,
                 0.3299896,0.3700101,0.3000003,
                 0.3700101,0.3000003,0.3299896),6,3)
colnames(data) <- c("Low","Medium","High")
rownames(data) <- paste("case",1:6)

> data
             Low    Medium      High
case 1 0.3000003 0.3700101 0.3299896
case 2 0.3299896 0.3000003 0.3700101
case 3 0.3700101 0.3299896 0.3000003
case 4 0.3299896 0.3000003 0.3700101
case 5 0.3700101 0.3299896 0.3000003
case 6 0.3000003 0.3700101 0.3299896

I am using this function but it seems like it is only calculating the first row. 我正在使用此函数,但似乎只在计算第一行。

assign.levels <- function(data) {

  for (i in nrow(data)) {

    scored.thetas.1 <- names(which.max(data[i,1:3])) ## I wrote 1:3 here because I have multiple columns in the original dataset.
    return(scored.thetas.1)

  }
}


> assign.levels(data)
[1] "Medium"

Any thoughts? 有什么想法吗?

Thanks in advance! 提前致谢!

Here's a vectorized solution that you may prefer: 这是您可能更喜欢的矢量化解决方案:

colnames(data)[apply(data, 1, which.max)]
# [1] "Medium" "High"   "Low"    "High"   "Low"    "Medium"

That's a concise version of your attempt: apply the function which.max to each row (dimension 1 ) of data and get a corresponding column name. 这就是你尝试的简明版: apply功能which.max每一行(尺寸1 )的data ,并得到相应的列名。

In terms of your attempt, here's a corrected version: 根据您的尝试,这是一个更正的版本:

assign.levels <- function(data) {
  scored.thetas.1 <- rep(NA, nrow(data))
  for (i in 1:nrow(data))
    scored.thetas.1[i] <- names(which.max(data[i, ]))
  scored.thetas.1
}
assign.levels(data)
# [1] "Medium" "High"   "Low"    "High"   "Low"    "Medium"

Several things to mention about your attempt: 1) you were iterating with i in nrow(data) , while nrow(data) is just a number. 关于您的尝试,有几件事需要提及:1)您i in nrow(data)中用i in nrow(data)进行了迭代,而nrow(data)只是一个数字。 So basically you were looking only at the last row; 因此,基本上,您只查看最后一行; 2) you kept redefining the same variable scored.thetas.1 in every iteration (in this case there was only one iteration, but the tendency was bad); 2)您在每次迭代重新定义了相同的scored.thetas.1变量(在这种情况下,只有一个迭代,但是趋势很差); 3) a loop is not a function, you don't need to return anything from it and instead you most likely want to store somewhere your newly obtained values. 3)循环不是一个函数,您不需要从中返回任何内容,而是您很可能想将新获得的值存储在某个地方。

In comparison, note that first I define an empty vector scored.thetas.1 of length nrow(data) . 相比之下,请注意,首先我定义了一个长度为nrow(data)的空矢量scored.thetas.1 Then I iterate over all the rows ( 1:nrow(data) ) and store a value for each row/iteration to scored.thetas.1[i] . 然后,我遍历所有行( 1:nrow(data) ),并将每个行/迭代的值存储到scored.thetas.1[i]

This should be fast 这应该很快

colnames(data)[max.col(data)]
#[1] "Medium" "High"   "Low"    "High"   "Low"    "Medium"

Here is a little benchmark. 这是一个小基准。

n <- 1e6
set.seed(1)
data <- matrix(runif(n * 3), ncol = 3)
colnames(data) <- c("Low","Medium","High")

library(microbenchmark)

benchmark <- microbenchmark(
  OP = assign.levels(data), # as defined in Julius's answer
  Julius = colnames(data)[apply(data, 1, which.max)],
  markus = colnames(data)[max.col(data)], times = 20
)

autoplot(benchmark)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM