dplyr：如何计算每组内不同值的频率

Question

I am probably having a failry easy question but cannnot figure it out.我可能有一个失败的简单问题，但无法弄清楚。

I am having a dataset that has two variables, both factors.我有一个包含两个变量的数据集，这两个变量都是因子。 It looks like this:它看起来像这样：

my.data<-data.frame(name=c("a","a","b","b","b","b", "b", "b", "e", "e", "e"),
                var1=c(1, 2, 3, 4, 2, 1, 4, 1, 3, 4, 3))

I would like to calculate the frequency of 1,2,3 and 4 for all a, b and e aggregated later into one row.我想为所有 a、b 和 e 计算 1、2、3 和 4 的频率，然后将它们聚合到一行中。 That means that all "a", "b" and "e" should be in one row and then I would like to create 4 variables which will indicate the frequency of all 1,2,3 and 4 across these rows.这意味着所有“a”、“b”和“e”都应该在一行中，然后我想创建 4 个变量来指示这些行中所有 1、2、3 和 4 的频率。 I have managed to calculate the frequencies for all counts of "a", "b" and "e" but I can't collapse all the "a", "b" and "e" into seperate rows.我已经设法计算出所有“a”、“b”和“e”计数的频率，但我不能将所有“a”、“b”和“e”折叠成单独的行。

My code is this one:我的代码是这个：

a <- my.data %>%
dplyr:: select(name, var1) %>%
mutate(name = as.factor(name),
     var1 = as.factor(var1)) %>% 
group_by(name, var1) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))

My results should look like this:我的结果应该是这样的：

name   Freq1   Freq2   Freq3   Freq4
  a    0,00    0,00    0,5     0,5
  b    0,30    0,30    0,30    0,10
  e    0,20    0,20    0,20    0,40

Thanks.谢谢。

Answer 1

You could also use base R's您也可以使用基本 R

prop.table(table(my.data), 1)

returning返回

    var1
name         1         2         3         4
   a 0.5000000 0.5000000 0.0000000 0.0000000
   b 0.3333333 0.1666667 0.1666667 0.3333333
   e 0.0000000 0.0000000 0.6666667 0.3333333

Answer 2

We can also make use of package janitor to great advantage here:我们也可以在这里利用 package janitor发挥很大的优势：

library(janitor)

my.data %>%
  tabyl(name, var1) %>%
  adorn_percentages()

 name         1         2         3         4
    a 0.5000000 0.5000000 0.0000000 0.0000000
    b 0.3333333 0.1666667 0.1666667 0.3333333
    e 0.0000000 0.0000000 0.6666667 0.3333333

OR或者

my.data %>%
  tabyl(name, var1) %>%
  adorn_percentages() %>%
  adorn_totals(c('row', 'col')) %>%
  adorn_pct_formatting(2)

  name      1      2      3      4   Total
     a 50.00% 50.00%  0.00%  0.00% 100.00%
     b 33.33% 16.67% 16.67% 33.33% 100.00%
     e  0.00%  0.00% 66.67% 33.33% 100.00%
 Total 83.33% 66.67% 83.33% 66.67% 300.00%

Answer 3

You can use pivot_wider to bring the data in wide format -您可以使用pivot_wider以宽格式获取数据 -

library(dplyr)
library(tidyr)

my.data %>%
  count(name, var1) %>%
  group_by(name) %>%
  mutate(n = prop.table(n)) %>%
  ungroup %>%
  pivot_wider(names_from = var1, values_from = n, names_prefix = 'Freq')

#  name   Freq1  Freq2  Freq3  Freq4
#  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
#1 a      0.5    0.5   NA     NA    
#2 b      0.333  0.167  0.167  0.333
#3 e     NA     NA      0.667  0.333

Answer 4

library(purrr)
my.data %>% 
  split(.$name) %>% 
  {cbind(name = names(.), map_dfr(., ~pluck(.x, "var1") %>% table() %>% prop.table()))}

  name         1         2         3         4
1    a 0.5000000 0.5000000        NA        NA
2    b 0.3333333 0.1666667 0.1666667 0.3333333
3    e        NA        NA 0.6666667 0.3333333

dplyr：如何计算每组内不同值的频率

问题描述

4 个解决方案

解决方案1
3 2021-06-11 12:21:09

解决方案2
3 已采纳 2021-06-11 12:32:01

解决方案3
0 2021-06-11 12:04:32

解决方案4
0 2021-06-11 12:20:55

dplyr：如何计算每组内不同值的频率

问题描述

4 个解决方案

解决方案1 3 2021-06-11 12:21:09

解决方案2 3 已采纳 2021-06-11 12:32:01

解决方案3 0 2021-06-11 12:04:32

解决方案4 0 2021-06-11 12:20:55

解决方案1
3 2021-06-11 12:21:09

解决方案2
3 已采纳 2021-06-11 12:32:01

解决方案3
0 2021-06-11 12:04:32

解决方案4
0 2021-06-11 12:20:55