簡體   English   中英

如何使用 dplyr 按組計算比例?

[英]How to calculate proportion by groups with dplyr?

我的數據集有兩組 A 和 B,總共 160 行。

我想知道如何在您擁有的每個組中提出項目:

  • 值 > 6.5
  • 值 < 3.5
  • 價值 < 3.5 & 價值 > 6.5

我的數據集

Dados = structure(list(Espessura = c(5.7, 4.3, 5.7, 5.3, 3.1, 3, 3.6, 
5.9, 4.4, 3.1, 5.8, 3.7, 5.9, 5.3, 6.7, 6, 4.2, 4.1, 2.8, 4.3, 
4.6, 4.7, 3.1, 5, 2.6, 5.2, 6.2, 5.4, 5.7, 3.4, 5.4, 6.9, 5.8, 
4, 5.8, 5.4, 4.7, 5.9, 3.6, 3.5, 5.9, 5.4, 6.5, 4.2, 4.4, 2.4, 
5.3, 6.2, 4.5, 5.9, 4.1, 6.7, 5.8, 5.9, 2.9, 6.8, 5.7, 3.5, 3.5, 
6.1, 5.5, 5.6, 4, 3.9, 3.8, 2.8, 5.5, 3.5, 5.5, 4.1, 2.9, 5.7, 
5.7, 2.7, 3.7, 5.6, 3.8, 5.9, 3, 4.9, 4.9, 6.5, 3.9, 2.3, 4.5, 
6.4, 5.8, 5.7, 5.1, 2.9, 6, 5.8, 5.1, 4.5, 4.5, 4, 5.4, 7, 3.3, 
6, 3.1, 6.3, 4.3, 5.3, 4.9, 5.6, 6, 2.8, 5.6, 3.5, 4, 6.5, 4.6, 
6.2, 6.4, 4, 2.4, 5.7, 6.3, 5.3, 3.7, 6.1, 5.7, 5.7, 3.7, 5.6, 
6.1, 3, 3.8, 5.7, 6.6, 5.8, 3.3, 2.7, 5.7, 6.4, 5.8, 3.5, 5.4, 
4.2, 6.1, 5.3, 5.4, 3.1, 5.1, 3.9, 6.4, 3.4, 6.7, 2.4, 5.1, 5.7, 
3.1, 6.2, 6.3, 4.9, 6.5, 4.5, 6.1, 5.7), Turma = structure(c(2L, 
1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 
1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor")), class = "data.frame", row.names = c(NA, 
-160L))

我的腳本

library("dplyr")
    Dados %>%
      group_by(Turma) %>%
      summarise(n = n())  %>%
      mutate(menor = dim(filter(Dados, Espessura < 3.5))/ n*100) %>%
      mutate(maior = dim(filter(Dados, Espessura > 6.5)) /n*100) %>%
      mutate(fora = dim(filter(Dados, Espessura < 3.5 |  Espessura > 6.5))/n*100)

錯誤的結果

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 5
  Turma     n menor maior  fora
  <fct> <int> <dbl> <dbl> <dbl>
1 A        80  32.5  8.75  41.2
2 B        80   2.5  2.5    2.5
> 

正確結果

在此處輸入圖像描述

library(dplyr)
library(magrittr)
Dados %>%
  group_by(Turma) %>%
  summarise(n = n(),
            menor = mean( Espessura < 3.5)*100,
            maior = mean( Espessura > 6.5)*100,
            fora = mean( Espessura < 3.5 |  Espessura > 6.5)*100)

威爾給

# A tibble: 2 x 5
  Turma     n menor maior  fora
  <fct> <int> <dbl> <dbl> <dbl>
1 A        80  32.5  0    32.5 
2 B        80   0    8.75  8.75

首先,我使用data.table方式來獲得正確的結果。

library(data.table)
dt <- setDT(Dados)
dt[,.(n = .N,
      menor = nrow(.SD[Espessura < 3.5])/.N*100,
      maior = nrow(.SD[Espessura > 6.5])/.N*100,
      fora = nrow(.SD[Espessura < 3.5 | Espessura > 6.5])/.N*100),
   by = Turma]

然后我發現你的第一個錯誤步驟是dim(filter(Dados, Espessura < 3.5)) 因為它的結果總是80 2 ,而不是你想要的80 0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM