简体   繁体   中英

How to filter genes in matrix based on quantile cutoff?

This is a matrix with some example data:

                  S1        S2       S3
ARHGEF10L       11.1818   11.0186  11.243
HIF3A            5.2482   5.3847   4.0013
RNF17            4.1956      0        0
RNF10            11.504   11.669.  12.0791
RNF11            9.5995   11.398    9.8248
RNF13            9.6257  10.8249    10.5608
GTF2IP1         11.8053  11.5487    12.1228
REM1             5.6835   3.5408    3.5582
MTVR2               0     1.4714      0
RTN4RL2          8.7486   7.9144    7.9795
C16orf13        11.8009   9.7438    8.9612
C16orf11            0        0         0
FGFR1OP2          7.679   8.7514    8.2857
TSKS             2.3036    2.8491   0.4699

I have a matrix "h" with 10,000 genes as rownames and 100 samples as columns. I need to select top 20% highly variable genes for clustering. But I'm not sure about what I gave is right or not.

So, for this filtering I have used genefilter R package .

varFilter(h, var.func=IQR, var.cutoff=0.8, filterByQuantile=TRUE)

Do you think the command which I gave is right to get top 20% highly variable genes? And can anyone please tell me how this method works in a statistical way?

I haven't used this package myself, but the helpfile of the function you're using makes the following remark:

IQR is a reasonable variance-filter choice when the dataset is split into two roughly equal and relatively homogeneous phenotype groups. If your dataset has important groups smaller than 25% of the overall sample size, or if you are interested in unusual individual-level patterns, then IQR may not be sensitive enough for your needs. In such cases, you should consider using less robust and more sensitive measures of variance (the simplest of which would be sd).

Since your data has a bunch of small groups, it might be wise to follow this advice to change your var.func to var.func = sd .

sd computes the standard deviation , which should be easy to understand.

However , this function expects its data in the form of an expressionSet object. The error message you got ( Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'exprs' for signature '"matrix"' ) implies that you don't have that, but just a plain matrix instead.

I don't know how to create an expressionSet , but I think that doing that is overly complicated anyways. So I would suggest going with the code that you posted in the comments:

vars <- apply(h, 1, sd)
h[vars > quantile(vars, 0.8), ] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM