This is a matrix with some example data:
S1 S2 S3
ARHGEF10L 11.1818 11.0186 11.243
HIF3A 5.2482 5.3847 4.0013
RNF17 4.1956 0 0
RNF10 11.504 11.669. 12.0791
RNF11 9.5995 11.398 9.8248
RNF13 9.6257 10.8249 10.5608
GTF2IP1 11.8053 11.5487 12.1228
REM1 5.6835 3.5408 3.5582
MTVR2 0 1.4714 0
RTN4RL2 8.7486 7.9144 7.9795
C16orf13 11.8009 9.7438 8.9612
C16orf11 0 0 0
FGFR1OP2 7.679 8.7514 8.2857
TSKS 2.3036 2.8491 0.4699
I have a matrix "h" with 10,000 genes as rownames and 100 samples as columns. I need to select top 20% highly variable genes for clustering. But I'm not sure about what I gave is right or not.
So, for this filtering I have used genefilter R package .
varFilter(h, var.func=IQR, var.cutoff=0.8, filterByQuantile=TRUE)
Do you think the command which I gave is right to get top 20% highly variable genes? And can anyone please tell me how this method works in a statistical way?
I haven't used this package myself, but the helpfile of the function you're using makes the following remark:
IQR is a reasonable variance-filter choice when the dataset is split into two roughly equal and relatively homogeneous phenotype groups. If your dataset has important groups smaller than 25% of the overall sample size, or if you are interested in unusual individual-level patterns, then IQR may not be sensitive enough for your needs. In such cases, you should consider using less robust and more sensitive measures of variance (the simplest of which would be sd).
Since your data has a bunch of small groups, it might be wise to follow this advice to change your var.func
to var.func = sd
.
sd
computes the standard deviation , which should be easy to understand.
However , this function expects its data in the form of an expressionSet
object. The error message you got ( Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'exprs' for signature '"matrix"'
) implies that you don't have that, but just a plain matrix instead.
I don't know how to create an expressionSet
, but I think that doing that is overly complicated anyways. So I would suggest going with the code that you posted in the comments:
vars <- apply(h, 1, sd)
h[vars > quantile(vars, 0.8), ]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.