简体   繁体   English

根据阈值选择R中的行

[英]Selecting rows in R based on threshold

In R, I have a matrix with N columns of all numbers. 在R中,我有一个矩阵,其中包含所有数字的N列。 (Each row has a name, but that's irrelevant.) I'd like to return rows where there is at least one column has a value greater than some threshold. (每行都有一个名称,但这无关紧要。)我想返回至少有一个列的值大于某个阈值的行。 Right now, I'm doing something like this: 现在,我正在做这样的事情:

THRESHOLD <- 10
#  my_matrix[,1] can be ignored
my_matrix <- subset (my_matrix, my_matrix[,1] > THRESHOLD | my_matrix[,2] > THRESHOLD | ... )

It seems odd to have to manually list each column. 必须手动列出每一列似乎很奇怪。 Also, if the number of input columns changes, I have to rewrite this. 另外,如果输入列数发生变化,我必须重写它。

There has to be a better way, but I can't figure out what I should be looking for. 必须有一种更好的方法,但是我无法弄清楚我应该寻找什么。

I can convert my matrix to a data frame, if that is easier... Any suggestions would be appreciated! 如果更容易,我可以将矩阵转换为数据框...任何建议将不胜感激!

find any row values greater than threshold using apply and use it to extract the rows from mat data. 使用apply查找任何大于阈值的行值,并使用它从mat数据中提取行。

mat[apply( mat2, 1, function( x ) any( x > threshold ) ), ]

EDIT: 编辑:

Break down of the above single line. 分解以上单行。

# create sample data by simulating samples from standard normal distribution
set.seed(1L)   # set random number generator for consistent data simulation

mat <- matrix( data = c(letters[1:3], as.character( rnorm(9, mean = 0, sd = 1))),
               byrow = FALSE, 
               nrow = 3, 
               ncol = 4 ) # create simulated data matrix

threshold <- 0  # set threshold

mat2 <- apply( mat[, 2:ncol(mat) ], 2, as.numeric )  # extract columns 2 to end and convert to numeric

# Get the logical indices (true or false) if any row has values greater than 0 (threshold)
row_indices <- apply( mat2, 1, function( x ) any( x > threshold ) )

mat[row_indices, ]  # extract matrix data rows that has TRUE in row_indices
#     [,1]                 [,2]                 [,3]                 [,4]               
# [1,] "a"  "-0.626453810742332" "1.59528080213779"   "0.487429052428485"
# [2,] "b"  "0.183643324222082"  "0.329507771815361"  "0.738324705129217"
# [3,] "c"  "-0.835628612410047" "-0.820468384118015" "0.575781351653492"

Note: 注意:

In your question, you mentioned that first column is character and the rest are numbers. 在您的问题中,您提到第一列是字符,其余是数字。 By rule, matrix can hold one data type. 根据规则,矩阵可以保存一种数据类型。 Given this information, I assume that your data matrix is a character data type. 根据这些信息,我假设您的数据矩阵是字符数据类型。 You can find it by using class(mat) . 您可以使用class(mat)找到它。 If it is character matrix, then extract columns 2 to end and then convert it to numeric. 如果它是字符矩阵,则提取第2列以结束,然后将其转换为数字。 Then use it in the apply loop to check for any values greater than threshold. 然后在Apply循环中使用它来检查是否有任何大于阈值的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM