简体   繁体   English

如何对R dataframe中的多列进行统计和分组?

[英]How to count and group multiple columns in R dataframe?

really basic question... I have a dataframe like the one below, where the numbers indicate a score:非常基本的问题...我有一个 dataframe,如下所示,其中数字表示分数:

df<-data.frame(A=c(1,2,1,1,3,3,2,2),B=c(2,2,2,3,2,3,3,1),C=c(1,1,1,1,1,2,2,3))

在此处输入图像描述

And I would like to change it to this format to plot it in a stacked bar chart:我想在堆叠条形图中将其更改为这种格式,即 plot:

在此处输入图像描述

I know how to do it in a very roundabout and probably overly complicated way, and any suggestions on a more "streamlined" way to do it would be very welcome!我知道如何以一种非常迂回且可能过于复杂的方式来做到这一点,并且非常欢迎任何关于更“简化”的方式来做到这一点的建议! Thanks in advance!提前致谢!

library(tidyverse)

df %>% 
  pivot_longer(everything(), names_to = "Score") %>% 
  count(Score, value, name = "Freq")

# A tibble: 9 × 3
  Score value  Freq
  <chr> <dbl> <int>
1 A         1     3
2 A         2     3
3 A         3     2
4 B         1     1
5 B         2     4
6 B         3     3
7 C         1     5
8 C         2     2
9 C         3     1

The dplyr solutions are likely more scalable, but an alternative base R approach: use do.call along with lapply and table then put it all in a data.frame: dplyr解决方案可能更具可扩展性,但另一种基本 R 方法:将do.calllapplytable一起使用,然后将其全部放入 data.frame 中:

data.frame(Name = rep(c("A", "B", "C"), each = 3),
      Score = rep(1:3, each = 3),
      Frequency = do.call(c, lapply(df[], table)))

#     Name Score Frequency
# A.1    A     1         3
# A.2    A     1         3
# A.3    A     1         2
# B.1    B     2         1
# B.2    B     2         4
# B.3    B     2         3
# C.1    C     3         5
# C.2    C     3         2
# C.3    C     3         1

Using base R使用base R

 as.data.frame(table(stack(df)[2:1]))
  ind values Freq
1   A      1    3
2   B      1    1
3   C      1    5
4   A      2    3
5   B      2    4
6   C      2    2
7   A      3    2
8   B      3    3
9   C      3    1

We can turn the data into long format and then calculate frequency我们可以把数据转成长格式,然后计算频率

df%>%
  gather(Name,Score,A:C)%>%
  group_by(Name,Score)%>%
  summarise(Frequency=n())%>%
  ungroup

  Name  Score Frequency
  <chr> <dbl>     <int>
1 A         1         3
2 A         2         3
3 A         3         2
4 B         1         1
5 B         2         4
6 B         3         3
7 C         1         5
8 C         2         2
9 C         3         1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM