如何计算r中每个变量中某些观测值的比例？

Question

I have data-frame (populations1) which consists of 11 million rows (observations) and 11 columns (individuals). 我有一个数据框架（人口1），其中包含1100万行（观察）和11列（个人）。 The first few rows of my dataframe look like this: 我的数据框的前几行如下所示：

> head(population1)
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1  7  3 NA NA 10 NA NA NA NA  NA  NA
2 14 11  7 NA 12  3  4  5 14   3   6
3 13 11  7 NA 11  4 NA  4 13   3   4
4  3 NA  4  5  4 NA NA  6 17  NA   7
5  3 NA  5  5  4 NA NA  7 20  NA   8
6  6 NA  3  6 NA NA NA  5 16  NA  10

For each individual, I want to estimate the proportion of observations with values more than 5. Is there any easy solution to do it in R? 对于每个人，我想估计值大于5的观测值的比例。在R中是否有任何简单的解决方案？

Answer 1

Here is a solution uses sapply to apply a function to each column. 这是一个使用sapply将函数应用于每个列的解决方案。 The function is defined to count how many observations are larger than 5 and then divided by the length of x. 定义该函数是为了计算有多少个观测值大于5，然后除以x的长度。

sapply(dt, function(x) sum(x > 5, na.rm = TRUE)/length(x))
       V1        V2        V3        V4        V5        V6        V7        V8        V9       V10 
0.6666667 0.3333333 0.3333333 0.1666667 0.5000000 0.0000000 0.0000000 0.3333333 0.8333333 0.0000000 
      V11 
0.6666667

DATA 数据

dt <- read.table(text = "  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1  7  3 NA NA 10 NA NA NA NA  NA  NA
                 2 14 11  7 NA 12  3  4  5 14   3   6
                 3 13 11  7 NA 11  4 NA  4 13   3   4
                 4  3 NA  4  5  4 NA NA  6 17  NA   7
                 5  3 NA  5  5  4 NA NA  7 20  NA   8
                 6  6 NA  3  6 NA NA NA  5 16  NA  10",
                 header = TRUE)

Answer 2

Here is an option using tidyverse 这是使用tidyverse的选项

library(dplyr)
pop1 %>%
     summarise_all(funs(sum(.>5, na.rm = TRUE)/n()))
#         V1        V2        V3        V4  V5 V6 V7        V8        V9 V10       V11
#1 0.6666667 0.3333333 0.3333333 0.1666667 0.5  0  0 0.3333333 0.8333333   0 0.6666667

If we need as a vector then unlist it 如果我们需要作为vector unlist

pop1 %>%
    summarise_all(funs(sum(.>5, na.rm = TRUE)/n())) %>%
    unlist(., use.names = FALSE)

如何计算r中每个变量中某些观测值的比例？

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-09-27 01:18:55

解决方案2
1 2017-09-27 03:53:24

如何计算r中每个变量中某些观测值的比例？

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-09-27 01:18:55

解决方案2 1 2017-09-27 03:53:24

解决方案1
4 已采纳 2017-09-27 01:18:55

解决方案2
1 2017-09-27 03:53:24