[英]Select rows of a matrix that meet a condition
In R with a matrix:在带有矩阵的 R 中:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
I want to extract the submatrix whose rows have column three = 11. That is:我想提取其行具有第三列 = 11 的子矩阵。即:
one two three four
[1,] 1 6 11 16
[3,] 3 8 11 18
[4,] 4 9 11 19
I want to do this without looping.我想在不循环的情况下做到这一点。 I am new to R so this is probably very obvious but the documentation is often somewhat terse.我是 R 的新手,所以这可能非常明显,但文档通常有些简洁。
This is easier to do if you convert your matrix to a data frame using as.data.frame().如果您使用 as.data.frame() 将矩阵转换为数据框,这将更容易做到。 In that case the previous answers (using subset or m$three) will work, otherwise they will not.在这种情况下,先前的答案(使用子集或 m$three)将起作用,否则将不起作用。
To perform the operation on a matrix , you can define a column by name:要对矩阵执行操作,您可以按名称定义列:
m[m[, "three"] == 11,]
Or by number:或按编号:
m[m[,3] == 11,]
Note that if only one row matches, the result is an integer vector, not a matrix.请注意,如果只有一行匹配,则结果是整数向量,而不是矩阵。
I will choose a simple approach using the dplyr package.我将选择使用 dplyr 包的简单方法。
If the dataframe is data.如果数据帧是数据。
library(dplyr)
result <- filter(data, three == 11)
m <- matrix(1:20, ncol = 4)
colnames(m) <- letters[1:4]
The following command will select the first row of the matrix above.以下命令将选择上面矩阵的第一行。
subset(m, m[,4] == 16)
And this will select the last three.这将选择最后三个。
subset(m, m[,4] > 17)
The result will be a matrix in both cases.在这两种情况下,结果都将是一个矩阵。 If you want to use column names to select columns then you would be best off converting it to a dataframe with如果您想使用列名来选择列,那么最好将其转换为数据框
mf <- data.frame(m)
Then you can select with然后你可以选择
mf[ mf$a == 16, ]
Or, you could use the subset command.或者,您可以使用子集命令。
Subset is a very slow function , and I personally find it useless. Subset 是一个非常慢的函数,我个人觉得它没用。
I assume you have a data.frame, array, matrix called Mat
with A
, B
, C
as column names;我假设您有一个名为Mat
的 data.frame、array、矩阵,其中A
、 B
、 C
作为列名; then all you need to do is:那么你需要做的就是:
In the case of one condition on one column, lets say column A在一列上有一个条件的情况下,假设列 A
Mat[which(Mat[,'A'] == 10), ]
In the case of multiple conditions on different column, you can create a dummy variable.在不同列有多个条件的情况下,您可以创建一个虚拟变量。 Suppose the conditions are A = 10
, B = 5
, and C > 2
, then we have:假设条件是A = 10
, B = 5
,和C > 2
,那么我们有:
aux = which(Mat[,'A'] == 10)
aux = aux[which(Mat[aux,'B'] == 5)]
aux = aux[which(Mat[aux,'C'] > 2)]
Mat[aux, ]
By testing the speed advantage with system.time
, the which
method is 10x faster than the subset
method.通过使用system.time
测试速度优势, which
方法比subset
方法快 10 倍。
如果您的矩阵称为m
,只需使用:
R> m[m$three == 11, ]
If the dataset is called data, then all the rows meeting a condition where value of column 'pm2.5' > 300 can be received by -如果数据集被称为数据,那么所有满足列 'pm2.5' > 300 的值的行都可以通过 -
data[data['pm2.5'] >300,]数据[数据['pm2.5']>300,]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.