简体   繁体   English

索引R中矩阵的元素

[英]Indexing the elements of a matrix in R

The problem is pretty silly, but I am wondering if I am missing something. 问题很愚蠢,但我想知道我是否遗漏了一些东西。 Let's say that there is a vector k that contains some numbers, say 比方说,有一个包含一些数字的向量k

> k
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

I want to transform this to a matrix 我想将其转换为矩阵

> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    0    6    7    8    9
[3,]    0    0   10   11   12
[4,]    0    0    0   13   14
[5,]    0    0    0    0   15

My first idea was to use something with upper.tri() , for example like m[upper.tri(m, diag = TRUE)] <- k , but that will not give the matrix above. 我的第一个想法是使用upper.tri()东西,例如m[upper.tri(m, diag = TRUE)] <- k ,但是这不会给出上面的矩阵。

Is there a more intelligent solution to this? 有更聪明的解决方案吗? Below there's my solution but let's just say I am not too proud of it. 下面是我的解决方案,但我们只是说我并不为此感到骄傲。


rows <- rep(1:5, 5:1)

cols1 <- rle(rows)$lengths


cols <- do.call(c, lapply(1:length(cols1), function(x) x:5))

for(i in 1:length(k)) {
  m[rows[i], cols[i]] <- k[i]
}

Here's an option using lower.tri and t to transpose the result: 这是一个使用lower.trit来转置结果的选项:

k <- 1:15
m <- matrix(0, 5,5)
m[lower.tri(m, diag = TRUE)] <- k
m <- t(m)
m 
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    2    3    4    5
#[2,]    0    6    7    8    9
#[3,]    0    0   10   11   12
#[4,]    0    0    0   13   14
#[5,]    0    0    0    0   15

Microbenchmark 微基准

Since there was some confusion with Joseph's benchmark, here's another one. 由于与约瑟夫的基准有一些混淆,这是另一个。 I tested the three solutions for matrices of size 10*10; 我测试了三种尺寸为10 * 10的矩阵的解决方案; 100*100; 100 * 100; 1000*1000; 1000 * 1000; 10000*10000. 10000 * 10000。

Results: 结果:

PIC

Apparently, the performance depends heavily on the size of the matrix. 显然,性能在很大程度上取决于矩阵的大小。 For large matrices, Joseph's answer performs fastest, while for smaller matrices, mine was the fastest approach. 对于大型矩阵,Joseph的答案表现最快,而对于较小的矩阵,我的答案最快。 Note that this doesn't take memory efficiency into account. 请注意,这不会考虑内存效率。

Reproducible benchmark: 可重复的基准:

Joseph <- function(k, n) {
  y <- 1L
  t <- rep(0L,n)
  j <- c(y, sapply(1:(n-1L), function(x) y <<- y+(n+1L)-x))
  t(vapply(1:n, function(x) c(rep(0L,x-1L),k[j[x]:(j[x]+n-x)]), t, USE.NAMES = FALSE))
}

Frank <- function(k, n) {
  m = matrix(0L, n, n)
  m[ which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1] ] = k
  m
}

docendo <- function(k,n) {
  m <- matrix(0L, n, n)
  m[lower.tri(m, diag = TRUE)] <- k
  t(m)
}

library(microbenchmark)
library(data.table)
library(ggplot2)
n <- c(10L, 100L, 1000L, 10000L)
k <- lapply(n, function(x) seq.int((x^2 + x)/2))

b <- lapply(seq_along(n), function(i) {
  bm <- microbenchmark(Joseph(k[[i]], n[i]), Frank(k[[i]], n[i]), docendo(k[[i]], n[i]), times = 10L)
  bm$n <- n[i]
  bm
})

b1 <- rbindlist(b)

ggplot(b1, aes(expr, time)) +
  geom_violin() +
  facet_wrap(~ n, scales = "free_y") +
  ggtitle("Benchmark for n = c(10L, 100L, 1000L, 10000L)")

Check equality of results: 检查结果的相等性:

all.equal(Joseph(k[[1]], n[1]), Frank(k[[1]], n[1]))
#[1] TRUE
all.equal(Joseph(k[[1]], n[1]), docendo(k[[1]], n[1]))
#[1] TRUE

Note: I didn't include George's approach in the comparison since, judging by Joseph's results, it seems to be a lot slower. 注意:我没有在比较中包含乔治的方法,因为根据约瑟夫的结果判断,它似乎要慢得多。 So all approaches compared in my benchmark are written only in base R. 所以在我的基准测试中比较的所有方法都只写在基础R中。

A variation on @docendodiscimus' answer: Instead of transposing you can change row and col indices, which you get by wrapping lower.tri in which : @docendodiscimus'答案的一个变体:你可以通过包装lower.tri来改变行和col索引,而不是转置你可以改变which

n = 5
m = matrix(0, n, n)

m[ which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1] ] = seq(sum(seq(n)))


     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    0    6    7    8    9
[3,]    0    0   10   11   12
[4,]    0    0    0   13   14
[5,]    0    0    0    0   15

To understand how it works, look at the left-hand side in steps: 要了解它的工作原理,请按步骤查看左侧:

  • lower.tri(m, diag=TRUE)
  • which(lower.tri(m, diag=TRUE), arr.ind=TRUE)
  • which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1]

I guess transposing might be costly if the matrix is large, which is why I'd consider this option. 我想如果矩阵很大,转置可能会很昂贵,这就是为什么我会考虑这个选项。 Note: Joseph Wood's answer suggests that I am wrong, since the transposing way is faster in his benchmark. 注意:Joseph Wood的回答表明我错了,因为他的基准测试中的转置方式更快。


(Thanks to @JosephWood:) Instead of enumerating and summing with sum(seq(n)) , you can use (n^2 - n)/2 + n . (感谢@JosephWood :)你可以使用(n^2 - n)/2 + n而不是用sum(seq(n))枚举和求和。

library(miscTools)
k <- 1:15
triang(k, 5)

Here is a really fast base R solution: 这是一个非常快速的基础R解决方案:

Update 更新

I've modified the code slightly so that I call only vapply one time instead of the sapply/vapply combo I had before (I also got rid of USE.NAMES=FALSE as it appears to not make any difference). 我稍微修改了代码,所以我只调用一次vapply而不是之前的sapply/vapply组合(我也摆脱了USE.NAMES=FALSE因为它似乎没有任何区别)。 Although this is a bit cleaner, it didn't change the timing significantly on my machine (I reran docendo's benchmarks with plots and it appears to be nearly the same). 虽然这有点干净,但它并没有在我的机器上显着改变时间(我重新调整了docendo的基准测试图,它看起来几乎相同)。

Triangle1 <- function(k,n) {
    y <- -n
    r <- rep(0L,n)
    t(vapply(1:n, function(x) {y <<- y+n+2L-x; c(rep(0L,x-1L),k[y:(y+n-x)])}, r))
}

Here are some timings: 以下是一些时间安排:

Triangle2 <- function(k,n) {
    m <- matrix(0, n,n)
    m[lower.tri(m, diag = TRUE)] <- k
    t(m)
}

Triangle3 <- function(k, n) {
    m = matrix(0, n, n)
    m[ which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1] ] = k   ## seq(sum(seq(n)))  for benchmarking
    m
}

k2 <- 1:50005000
n2 <- 10^4

system.time(t1 <- Triangle1(k2,n2))
user  system elapsed           ## previously   user  system elapsed
2.29    0.08    2.41           ##              2.37    0.13    2.52

system.time(t2 <- Triangle2(k2,n2))
user  system elapsed 
5.40    0.91    6.30

system.time(t3 <- Triangle3(k2,n2))
user  system elapsed 
7.70    1.03    8.77 

system.time(t4 <- triang(k2,n2))
user  system elapsed 
433.45    0.20  434.88

One thing that is a bit puzzling to me is that the object produced by Triangle1 is half the size of all the other solutions. 对我来说有点令人费解的是, Triangle1生成的对象是所有其他解决方案的一半。

object.size(t1)
400000200 bytes

object.size(t2)   ## it's the same for t3 and t4
800000200 bytes

When I do some checks, it only gets more confusing. 当我做一些检查时,它只会变得更加混乱。

all(sapply(1:ncol(t1), function(x) all(t1[,x]==t2[,x])))
[1] TRUE

class(t1)
[1] "matrix"
class(t2)
[1] "matrix"

attributes(t1)
$dim
[1] 10000 10000
attributes(t2)
$dim
[1] 10000 10000

## not sure what's going on here
identical(t1,t2)
[1] FALSE

identical(t2,t3)
[1] TRUE

As @Frank pointed out in the comments, t1 is an integer matrix whereas the others are numeric. 正如@Frank在评论中指出的那样, t1是整数矩阵,而其他是数字。 I should have known this as one of the most important R functions would have told me this information from the outset. 我应该知道这是最重要的R函数之一从一开始就告诉我这些信息。

str(t1)
int [1:10000, 1:10000] 1 0 0 0 0 0 0 0 0 0 ...
str(t2)
num [1:10000, 1:10000] 1 0 0 0 0 0 0 0 0 0 ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM