简体   繁体   English

通过另一个变量计算几个二进制变量的比例

[英]Calculate proportion of several binary variables by another variable

I have data with several binary variables, and I want to calculate the proportion of each one, by another variable.我有几个二进制变量的数据,我想通过另一个变量计算每个变量的比例。

Example例子

I survey people and ask them:我调查人们并问他们:
Please mark which of the following fruits you like (can mark more than one choice):请标出您喜欢以下哪种水果(可多选):
☐ Banana ☐ Apple ☐ Orange ☐ Strawberry ☐ Peach ☐ 香蕉 ☐ 苹果 ☐ 橙子 ☐ 草莓 ☐ 桃子

Each person who checked the box gets 1 in the data, and when leaving blank it's denoted as 0 .选中该框的每个人在数据中都得到1 ,当留空时,它表示为0 The data looks like that:数据如下所示:

library(dplyr)

set.seed(2021)

my_df <-
  matrix(rbinom(n = 100, size = 1, prob = runif(1)), ncol = 5) %>%
  as.data.frame() %>%
  cbind(1:20, ., sample(c("male", "female"), size = 20, replace = T)) %>%
  setNames(c("person_id", "banana", "apple", "orange", "strawberry", "peach", "gender"))

my_df
#>    person_id banana apple orange strawberry peach gender
#> 1          1      1     1      1          0     0 female
#> 2          2      1     0      0          0     1 female
#> 3          3      0     0      1          0     1 female
#> 4          4      1     1      0          1     0 female
#> 5          5      1     1      1          0     0   male
#> 6          6      1     1      1          0     1 female
#> 7          7      0     1      0          1     1   male
#> 8          8      1     1      0          0     0   male
#> 9          9      1     1      1          0     0 female
#> 10        10      0     0      0          0     0   male
#> 11        11      1     1      1          1     1   male
#> 12        12      1     1      0          0     1   male
#> 13        13      1     1      0          1     0   male
#> 14        14      1     1      0          0     0   male
#> 15        15      0     0      0          0     1   male
#> 16        16      0     1      0          0     1   male
#> 17        17      1     0      0          0     1   male
#> 18        18      1     1      1          1     1   male
#> 19        19      0     0      1          1     1 female
#> 20        20      0     0      0          0     0 female

Created on 2021-02-01 by the reprex package (v0.3.0)reprex package (v0.3.0) 于 2021 年 2 月 1 日创建

I want to get the proportion for each fruit, split by gender .我想得到每个水果的比例,按gender划分。 From this answer I learned how to do it for one variable (for example, banana ):这个答案中,我学会了如何为一个变量(例如, banana )做这件事:

my_df %>%
  group_by(gender) %>%
  summarise(n_of_observations = n(), prop = sum(banana == 1)/n())

## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
##   gender n_of_observations  prop
##   <chr>              <int> <dbl>
## 1 female                10   0.6
## 2 male                  10   0.4

But how can I get such a table for all fruits?但是我怎样才能得到一张适合所有水果的桌子呢?

Desired output:所需的 output:

##    fruit      gender  prop
##    <chr>      <chr>  <dbl>
##  1 banana     female   0.6
##  2 banana     male     0.4
##  3 apple      female   0.4
##  4 apple      male     0.3
##  5 orange     female   0.3
##  6 orange     male     0.1
##  7 strawberry female   0.4
##  8 strawberry male     0.4
##  9 peach      female   0.3
## 10 peach      male     0.6

I'm looking for a dplyr solution, if possible.如果可能的话,我正在寻找dplyr解决方案。 Thanks a lot!非常感谢!

You can use across to summarize multiple variables at once:您可以使用 cross 一次汇总across变量:

my_df %>%
  group_by(gender) %>%
  summarise(across(banana:peach, list(n = ~length(.x), prop = ~sum(.x == 1) / n())))


# A tibble: 2 x 11
  gender banana_n banana_prop apple_n apple_prop orange_n orange_prop strawberry_n strawberry_prop peach_n peach_prop
  <chr>     <int>       <dbl>   <int>      <dbl>    <int>       <dbl>        <int>           <dbl>   <int>      <dbl>
1 female        8       0.625       8       0.5         8       0.625            8           0.25        8      0.5  
2 male         12       0.667      12       0.75       12       0.25            12           0.333      12      0.583

Note that the first argument of across specifies the variables you want to summarize.请注意,cross 的第一个参数指定要汇总的变量。 Here, I wrote banana:peach meaning all columns between banana and peach .在这里,我写了banana:peach表示bananapeach之间的所有列。

You can use tidyr to pivot your data first and then summarize it:您可以先使用tidyr来 pivot 您的数据,然后对其进行汇总:

library(tidyr)

tidyr::pivot_longer(my_df, banana:peach,
                    names_to = "fruit") %>% 
  dplyr::group_by(gender, fruit) %>% 
  dplyr::summarize(prop = sum(value) / n())

   gender fruit       prop
   <chr>  <chr>      <dbl>
 1 female apple      0.5  
 2 female banana     0.625
 3 female orange     0.625
 4 female peach      0.5  
 5 female strawberry 0.25 
 6 male   apple      0.75 
 7 male   banana     0.667
 8 male   orange     0.25 
 9 male   peach      0.583
10 male   strawberry 0.333

You can pipe it to arrange if you want to sort by fruit .如果你想按fruit排序,你可以 pipe 它来arrange You can also add the number of observations in the summarize function with n = n() .您还可以在summarize function 中添加观察数,其中n = n()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对连续的预测变量进行分类并计算二进制结果的比例 - Categorize a continuous predictor variable and calculate proportion of binary outcome 如何根据另一个二进制文件中的类来计算几个二进制文件中变量的平均值? - How to calculate the averages of a variable in several binary files based on classes in another binary file? 如何通过R中的dplyr中的另一个变量(而不是频率)来计算比例 - how to calculate proportion by another variable (not by frequency) in dplyr in R 是否可以让 R 从一组二元变量中随机选取两个变量来计算比例? - Is it possible to ask R randomly pick two variables from a group of binary variables to calculate the proportion? 计算二进制变量在R中的另一个变量的百分比 - Calculate percentages of a binary variable BY another variable in R R:如何使用 group_by() 计算一个变量在另一个变量中的比例? - R: How to use group_by() to calculate the proportion of a variable within another variable? 如何计算R中两个分类变量的比例 - How to calculate the proportion of two categorical variables in R 结合几个二进制变量 - Combining several binary variables 如何将几个二元变量组合成一个新的分类变量 - How to combine several binary variables into a new categorical variable 使用 dplyr 创建具有多个分类/因子变量的汇总比例表 - Using dplyr to create summary proportion table with several categorical/factor variables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM