how to calculate the proportion of certain observations in each variable in r?

Question

I have data-frame (populations1) which consists of 11 million rows (observations) and 11 columns (individuals). The first few rows of my dataframe look like this:

> head(population1)
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1  7  3 NA NA 10 NA NA NA NA  NA  NA
2 14 11  7 NA 12  3  4  5 14   3   6
3 13 11  7 NA 11  4 NA  4 13   3   4
4  3 NA  4  5  4 NA NA  6 17  NA   7
5  3 NA  5  5  4 NA NA  7 20  NA   8
6  6 NA  3  6 NA NA NA  5 16  NA  10

For each individual, I want to estimate the proportion of observations with values more than 5. Is there any easy solution to do it in R?

Answer 1

Here is a solution uses sapply to apply a function to each column. The function is defined to count how many observations are larger than 5 and then divided by the length of x.

sapply(dt, function(x) sum(x > 5, na.rm = TRUE)/length(x))
       V1        V2        V3        V4        V5        V6        V7        V8        V9       V10 
0.6666667 0.3333333 0.3333333 0.1666667 0.5000000 0.0000000 0.0000000 0.3333333 0.8333333 0.0000000 
      V11 
0.6666667

DATA

dt <- read.table(text = "  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1  7  3 NA NA 10 NA NA NA NA  NA  NA
                 2 14 11  7 NA 12  3  4  5 14   3   6
                 3 13 11  7 NA 11  4 NA  4 13   3   4
                 4  3 NA  4  5  4 NA NA  6 17  NA   7
                 5  3 NA  5  5  4 NA NA  7 20  NA   8
                 6  6 NA  3  6 NA NA NA  5 16  NA  10",
                 header = TRUE)

Answer 2

Here is an option using tidyverse

library(dplyr)
pop1 %>%
     summarise_all(funs(sum(.>5, na.rm = TRUE)/n()))
#         V1        V2        V3        V4  V5 V6 V7        V8        V9 V10       V11
#1 0.6666667 0.3333333 0.3333333 0.1666667 0.5  0  0 0.3333333 0.8333333   0 0.6666667

If we need as a vector then unlist it

pop1 %>%
    summarise_all(funs(sum(.>5, na.rm = TRUE)/n())) %>%
    unlist(., use.names = FALSE)

how to calculate the proportion of certain observations in each variable in r?

Question

2 answers

solution1
4 ACCPTED 2017-09-27 01:18:55

solution2
1 2017-09-27 03:53:24

how to calculate the proportion of certain observations in each variable in r?

Question

2 answers

solution1 4 ACCPTED 2017-09-27 01:18:55

solution2 1 2017-09-27 03:53:24

solution1
4 ACCPTED 2017-09-27 01:18:55

solution2
1 2017-09-27 03:53:24