I have a data table like
sample1 sample2 sample3
fruit1 10 20 30
fruit2 1 5 6
fruit3 3 7 8
etc.
I want to find the top 1 percentile of fruits in each sample in R (according to the number in each sample). Is there a simple way to do this?
You can lapply
over your data and for each column, subset the rownames of df
with a logical vector which is TRUE
when the corresponding value in the column is in the 1 percentile (ie above the 100 - 1
percentile).
Create example data
set.seed(2019)
df <- as.data.frame(matrix(sample(1e4, replace = T), 1e3, 10))
names(df) <- paste0('sample', seq_along(df))
rownames(df) <- paste0('fruit', seq_len(nrow(df)))
Step described above:
lapply(df, function(x) rownames(df)[x > quantile(x, (100 - 1)/100)])
# $`sample1`
# [1] "fruit57" "fruit76" "fruit149" "fruit471" "fruit520" "fruit682" "fruit805"
# [8] "fruit949" "fruit966" "fruit975"
#
# $sample2
# [1] "fruit49" "fruit109" "fruit232" "fruit274" "fruit312" "fruit795" "fruit883"
# [8] "fruit884" "fruit955" "fruit958"
#
# $sample3
# [1] "fruit37" "fruit189" "fruit231" "fruit256" "fruit473" "fruit654" "fruit729"
# [8] "fruit742" "fruit820" "fruit979"
#
# ...
Assuming your data frame is calle "fruit"
fruit <- fruit[order(fruit$sample1,decreasing = TRUE)]
top.1.percent <- fruit[1:length(fruit$sample1)/100,]
This should do the trick for sample1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.