简体   繁体   English

如何根据特定条件过滤行?

[英]How to filter rows based on certain criteria?

I have an example file as follows: 我有一个示例文件,如下所示:

GENES Samp1 Samp2 Samp3 Samp4 Samp5 Samp6 Samp7 Samp8
g1    0.000 0.000 0.000 0.000 0.010 0.000 0.022 0.344
g2    0.700 0.000 0.000 0.000 0.000 0.000 0.000 0.000
g3    0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
g4    0.322 0.782 0.000 0.023 0.000 0.000 0.000 0.345
g5    0.010 0.000 0.333 0.000 0.000 0.000 0.011 0.000
g6    0.000 0.000 0.010 0.000 0.000 0.000 0.000 0.000

I need to retrieve the list of rows (genes) if it has "2 or more samples" with the values "0.010 or more". 如果它具有“ 2个或更多样本”且值为“ 0.010或更多”,则需要检索行(基因)的列表。 So I should get the resulting column as follows.: 所以我应该得到结果列如下:

GENES
g1
g4
g5

Can anyone help me with this ? 谁能帮我这个 ?

Here's one possible way: 这是一种可能的方法:

DF <- read.table(text=
"GENES Samp1 Samp2 Samp3 Samp4 Samp5 Samp6 Samp7 Samp8
g1 0.000 0.000 0.000 0.000 0.010 0.000 0.022 0.344
g2 0.700 0.000 0.000 0.000 0.000 0.000 0.000 0.000
g3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
g4 0.322 0.782 0.000 0.023 0.000 0.000 0.000 0.345
g5 0.010 0.000 0.333 0.000 0.000 0.000 0.011 0.000
g6 0.000 0.000 0.010 0.000 0.000 0.000 0.000 0.000",header=T,sep=' ')


rows <- sapply(1:nrow(DF),FUN=function(i){sum(DF[i,2:ncol(DF)] >= 0.01) >= 2})
subSet <- DF[rows,]

> subSet
  GENES Samp1 Samp2 Samp3 Samp4 Samp5 Samp6 Samp7 Samp8
1    g1 0.000 0.000 0.000 0.000  0.01     0 0.022 0.344
4    g4 0.322 0.782 0.000 0.023  0.00     0 0.000 0.345
5    g5 0.010 0.000 0.333 0.000  0.00     0 0.011 0.000

or similarly this: 或类似的情况:

subSet <- DF[apply(DF,1,function(x){sum(tail(x,-1) >= 0.01) >= 2}),]

or this: 或这个:

subSet <- DF[rowSums(DF[,2:ncol(DF)] >= 0.01) >= 2,]

as you can see there are many ways to accomplish that :) 如您所见,有很多方法可以实现这一目标:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM