I have data-frame (populations1) which consists of 11 million rows (observations) and 11 columns (individuals). The first few rows of my dataframe look like this:
> head(population1)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 7 3 NA NA 10 NA NA NA NA NA NA
2 14 11 7 NA 12 3 4 5 14 3 6
3 13 11 7 NA 11 4 NA 4 13 3 4
4 3 NA 4 5 4 NA NA 6 17 NA 7
5 3 NA 5 5 4 NA NA 7 20 NA 8
6 6 NA 3 6 NA NA NA 5 16 NA 10
For each individual, I want to estimate the proportion of observations with values more than 5. Is there any easy solution to do it in R?
Here is a solution uses sapply
to apply a function to each column. The function is defined to count how many observations are larger than 5 and then divided by the length of x.
sapply(dt, function(x) sum(x > 5, na.rm = TRUE)/length(x))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
0.6666667 0.3333333 0.3333333 0.1666667 0.5000000 0.0000000 0.0000000 0.3333333 0.8333333 0.0000000
V11
0.6666667
DATA
dt <- read.table(text = " V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 7 3 NA NA 10 NA NA NA NA NA NA
2 14 11 7 NA 12 3 4 5 14 3 6
3 13 11 7 NA 11 4 NA 4 13 3 4
4 3 NA 4 5 4 NA NA 6 17 NA 7
5 3 NA 5 5 4 NA NA 7 20 NA 8
6 6 NA 3 6 NA NA NA 5 16 NA 10",
header = TRUE)
Here is an option using tidyverse
library(dplyr)
pop1 %>%
summarise_all(funs(sum(.>5, na.rm = TRUE)/n()))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
#1 0.6666667 0.3333333 0.3333333 0.1666667 0.5 0 0 0.3333333 0.8333333 0 0.6666667
If we need as a vector
then unlist
it
pop1 %>%
summarise_all(funs(sum(.>5, na.rm = TRUE)/n())) %>%
unlist(., use.names = FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.