簡體   English   中英

如何計算和計算 R data.frame 中兩列的百分比?

[英]How to count and calculate percentages for two columns in an R data.frame?

在 R 中,我有一個像這樣的 data.frame:

df1 <- data.frame(
  grade = rep(LETTERS[1:5], 4),
  sex = c(rep("male", 5), rep("female", 5), rep("male", 4), rep("female", 6)),
  class = c(rep(1, 10), rep(2, 10))
)

df1

   grade    sex class
1      A   male     1
2      B   male     1
3      C   male     1
4      D   male     1
5      E   male     1
6      A female     1
7      B female     1
8      C female     1
9      D female     1
10     E female     1
11     A   male     2
12     B   male     2
13     C   male     2
14     D   male     2
15     E female     2
16     A female     2
17     B female     2
18     C female     2
19     D female     2
20     E female     2

我想計算每個班級中性別的百分比並制作另一個 data.frame ,如:

Class Male_percent Female_percentage 
1     50%          50% 
2     40%          60%

有人可以教我怎么做嗎? 這個問題以前可能有人問過,但我不知道這個問題的關鍵詞是什么。 如果我再次問同樣的問題,我很抱歉。

你可以試試

 prop.table(table(df1[3:2]),1)*100
 #    sex
 #class female male
 #  1     50   50
 #  2     60   40

或者使用data.table

 library(data.table)
 setDT(df1)[, .N, by = .(class, sex)
          ][, .(Male_percent = paste0(100 * N[sex == 'male'] / sum(N), '%'), 
              Female_percent = paste0(100 * N[sex == 'female'] / sum(N), '%')), 
           by = class] 
 #   class Male_percent   Female_percent
 #1:     1          50%              50%
 #2:     2          40%              60%

或者使用dplyr

 library(dplyr)
 df1 %>%
     group_by(class) %>% 
     summarise(Male_Percent= sprintf('%d%%', 100*sum(sex=='male')/n()), 
             Female_Percent = sprintf('%d%%', 100*sum(sex=='female')/n()))
 #    class Male_Percent Female_Percent
 #1     1          50%            50%
 #2     2          40%            60%

或者

  library(sqldf)
  res1 <- sqldf('select class, 
            100*sum(sex=="male")/count(sex) as m, 
            100*sum(sex=="female")/count(sex) as f,
            "%" as p
             from df1
             group by class')
   sqldf("select class,
           m||p as Male_Percent, 
           f||p as Female_Percent 
           from res1")
   #  class Male_Percent Female_Percent
   #1     1          50%            50%
   #2     2          40%            60%

更新

基於@G.Grothendieck 的評論, sqldf評論可以簡化為

   sqldf("select class,
        (100 * avg(sex = 'male')) || '%' as Male_Percent,
        (100 * avg(sex = 'female')) || '%' as Female_Percent
        from df1 group
         by class")
   #     class Male_Percent Female_Percent
   #1     1        50.0%          50.0%
   #2     2        40.0%          60.0%

試試看tabyl人包中的tabyl

library(janitor)
df1 %>%
  tabyl(class, sex) %>%
  adorn_percentages()

 class female male
     1    0.5  0.5
     2    0.6  0.4

如果要格式化為百分比,請添加adorn_pct_formatting()

df1 %>%
  tabyl(class, sex) %>%
  adorn_percentages() %>%
  adorn_pct_formatting()

 class female  male
     1  50.0% 50.0%
     2  60.0% 40.0%

免責聲明:我是這些函數的作者。

使用data.table包,您可以執行以下操作

setDT(df)[ , .(
                Male_Percent = paste0(( nrow(.SD[sex == "male"]) / .N ) * 100 , "%")   , 
                Female_Percent = paste0(( nrow(.SD[sex == "female"]) / .N ) * 100 , "%")
              )   , 
           by = class
         ]

結果

#     class      Male_Percent  Female_Percent
# 1:     1          50%            50%
# 2:     2          40%            60%

另一個dplyr解決方案將是

df %>%
  group_by(sex , class) %>%
  summarise(n = n()) %>%
  group_by(class) %>%
  summarise(
    Male_Percent = paste0((n[sex == "male"] / sum(n)) * 100 , "%")    , 
    Female_Percent = paste0((n[sex == "female"] / sum(n) * 100) , "%")   
  )

#  class   Male_Percent     Female_Percent
#   1          50%            50%
#   2          40%            60%

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM