简体   繁体   English

识别满足条件的行并存储在矩阵中

[英]Identify rows that meet condition and store in matrix

I need to identify rows of a matrix that meet a condition. 我需要确定满足条件的矩阵行。 I set the problem up as follows. 我将问题设置如下。 Overall, the goal is to identify 1) what are the top two entries in a particular column and 2) what rows these correspond to. 总体而言,目标是确定1)特定列中的前两个条目是什么,以及2)这些列对应的行。 I want then want to store the respective rows in a 2xn matrix. 然后,我想将各个行存储在2xn矩阵中。

Mat1 <- data.frame(matrix(nrow = 10, ncol =250, data = rnorm(250,0,1)))
seq1 <- seq(1, 247,3)

 Mat1[,1:4]
            X1         X2          X3           X4
1   0.39560216 -1.2391890  1.00771944 -0.225181006
2  -0.92136335 -0.5042209  0.51758214 -0.008936688
3  -0.67657261  1.3167817 -0.22997139 -1.478361654
4  -1.94389531  0.7944302 -0.16763378 -1.847748926
5   0.11998316  0.4850342 -2.47604164 -0.846030811
6   1.26607727  2.3710318 -0.60115423  1.255747735
7  -1.09798680 -0.2817050  0.03150861 -1.350501958
8   0.43790646  0.1989955  1.22612459  0.323815132
9   0.61639304  0.8102352 -0.69921481  0.118795023
10  0.01786964 -0.1222586 -1.50414879  0.649616182


So in Column 1 (seq1[1]) The top two entries are 1.266077 and 0.616393. 因此,在第1列(seq1 [1])中,前两个条目是1.266077和0.616393。 These correspond to rows 6 and row 5. In column 4 the top two entries are 1.2557477 and 0.6496162. 这些对应于第6行和第5行。在第4列中,前两个条目是1.2557477和0.6496162。 These correspond to rows 6 and 10. I want to repeat this process for all elements in seq1. 这些对应于第6行和第10行。我想对seq1中的所有元素重复此过程。 I want to store the output in a matrix (say Output) that is 2 x length(seq1). 我想将输出存储在2 x length(seq1)的矩阵中(例如Output)。 The first row should correspond to the Maximum value, the Second row should be the second highest value. 第一行应对应于最大值,第二行应为第二最大值。

You can maybe try something like this: 您可以尝试这样的事情:

set.seed(2) # "fix" your random numbers due reproducibility
Mat1 <- data.frame(matrix(nrow = 10, ncol =250, data = rnorm(250,0,1)))
seq1 <- seq(1, 247,3)

# select the interesting columns
Mat2 <- Mat1[,c(seq1)]

# create a matrix with the row names of the top 2 values for each interesting column
dat <- sapply(Mat2, function(x) head(row.names(Mat2)[order(x, decreasing = TRUE)], 2)   
class(dat)
[1] "matrix"

dat[,1:4]
     X1  X4  X7  X10
[1,] "9" "3" "2" "7"
[2,] "3" "1" "5" "2"

You can get the indices with sapply and order and subsetting ( [1:2] ): 您可以通过sapplyorder以及subsetting[1:2] )获得索引:

tt <- sapply(Mat1[,seq1], function(x) order(x, decreasing = TRUE)[1:2])
#or
tt <- sapply(Mat1[,seq1], order, decreasing = TRUE)[1:2,]

and the values with: 以及带有以下值的值:

matrix(Mat1[matrix(c(tt, rep(seq1, each=2)), ncol = 2)], 2)
#or
sapply(Mat1[,seq1], function(x) sort(x, decreasing = TRUE)[1:2])

You can get the indices of all other but not the two largest rows with: 您可以使用以下方法获得所有其他行但不是两个最大行的索引:

sapply(Mat1[,seq1], order, decreasing = TRUE)[-(1:2),]

You can do: 你可以做:

M <- read.table(header=TRUE, text=
"X1         X2          X3           X4
0.39560216 -1.2391890  1.00771944 -0.225181006
-0.92136335 -0.5042209  0.51758214 -0.008936688
-0.67657261  1.3167817 -0.22997139 -1.478361654
-1.94389531  0.7944302 -0.16763378 -1.847748926
0.11998316  0.4850342 -2.47604164 -0.846030811
1.26607727  2.3710318 -0.60115423  1.255747735
-1.09798680 -0.2817050  0.03150861 -1.350501958
0.43790646  0.1989955  1.22612459  0.323815132
0.61639304  0.8102352 -0.69921481  0.118795023
0.01786964 -0.1222586 -1.50414879  0.649616182")

M <- as.matrix(M)
M

my12 <- function(x) { m <- which.max(x); x[m] <- -Inf; c(m, which.max(x)) };
apply(M, 2, my12)
# > apply(M, 2, my12)
#      X1 X2 X3 X4
# [1,]  6  6  8  6
# [2,]  9  3  1 10

To get the values (eg the maxima): 要获取值(例如最大值):

I <- apply(M, 2, my12)
M[cbind(I[1,], 1:ncol(M))]

If M is a dataframe you can do sapply(M, my12) ... 如果M是一个数据帧,则可以执行sapply(M, my12) ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM