简体   繁体   English

R 中的 5x5 气泡图

[英]5x5 Bubble Chart in R

I am trying to reproduce a graphic similar to what is used on the California Dashboard for School Accountability in R Shiny.我正在尝试重现类似于在 R Shiny 中用于学校问责制的加利福尼亚仪表板中使用的图形。 For sake of this example, consider the following data frame:为了这个例子,考虑以下数据框:

student <- c("1234", "4321", "5678", "8765")
schools <- c("ABC", "ABC", "XYZ", "XYZ")
DFS_20 <- c(-34.2, -1.5, 2.8, 8.9)
DFS_21 <- c(-13.5, 27.8, 5.4, 3.9)
school_data <- data.frame("student_id" = student, "school_id" = schools, "DFS_2020" = DFS_20, "DFS_2021" = DFS_21, "Delta_DFS" = DFS_21 - DFS_20)

I would like to somehow plot this data on a grid like this:我想以某种方式在这样的网格上 plot 这个数据: 5x5 矩阵

where you would have a data point in at [x, y] = [4, 1] (with lower left being (0,0) representing student 1234 because their DFS_2021 score is LOW (-13.5) but their year-over-year growth increased significantly (20.7); a point in [x, y] = [4, 3] for student 4321 because their DFS_2021 score is HIGH (27.8) and their year-over-year growth increased significantly (29.3) etc. I want a Bubble chart so that the point increases in size relative to the number of data points within each cell, but I have no idea where to start creating the canvas (with colors) to overlay the data points onto. I know I can translate their scores into lattice points to plot on a 5x5 grid, but to make the grid with the colors is beyond my skillset.您将在 [x, y] = [4, 1] 处有一个数据点(左下角为 (0,0) 代表学生 1234,因为他们的 DFS_2021 分数为低(-13.5)但他们的年比增长显着增加 (20.7);学生 4321 在 [x, y] = [4, 3] 中有一个点,因为他们的 DFS_2021 得分很高 (27.8) 并且他们的同比增长显着增加 (29.3) 等等。我想要气泡图,以便点的大小相对于每个单元格内的数据点数量增加,但我不知道从哪里开始创建 canvas(带颜色)以覆盖数据点。我知道我可以翻译他们的分数在 5x5 网格上将网格点放入 plot ,但是用 colors 制作网格超出了我的技能范围。

In short you can recode the valus into factors and count each combination in your dataset.简而言之,您可以将值重新编码为因子并计算数据集中的每个组合。 With this new table (containing current DFS level, DFS difference level and number of students in each category) you can easily create a point-plot.使用这个新表(包含当前 DFS 级别、DFS 差异级别和每个类别的学生人数),您可以轻松创建点图。

To color code your points you may need an extra column in your table with the color.要对您的点进行颜色编码,您可能需要在表格中添加一个带有颜色的额外列。 Therefore I created a meta color table (all DFS combinations and the associated color) and joined the tables.因此,我创建了一个元颜色表(所有 DFS 组合和相关的颜色)并加入了这些表。

Code代码

# load packages
library(tidyverse)

# create color table
df_col <- crossing(DFS_current_status = factor(c("very low", "low", "medium",
                                                 "high", "very high"),
                                               levels = c("very high", "high", 
                                                              "medium", "low", "very low")), 
                   DFS_diff = factor(c("declined significantly", "declined",
                                "maintained" ,"increased", 
                                "increased significantly"), 
                                levels = c("declined significantly",
                                           "declined", "maintained",
                                           "increased", "increased significantly"))) %>%
  add_column(color = c("green", "green", "blue", "blue","blue",
                       "green", "green", "green", "green", "blue",
                       "yellow", "yellow", "yellow", "green", "green", 
                       "orange", "orange", "orange", "yellow", "yellow", 
                       "red", "red", "red", "orange", "orange"))


# transform data
df <- school_data %>%  
  mutate(DFS_current_status = case_when(DFS_2021 >=  45 ~ "very high",
                                        between(DFS_2021, 10, 44.9) ~ "high",
                                        between(DFS_2021, -5, 9.9) ~ "medium",
                                        between(DFS_2021, -70, -5.1) ~ "low",
                                        DFS_2021 < -70 ~ "very low",
                                        TRUE ~ NA_character_),
         DFS_diff = case_when(Delta_DFS < -15 ~ "declined significantly",
                              between(Delta_DFS, -15, -3) ~ "declined",
                              between(Delta_DFS, -2.9, 2.9) ~ "maintained",
                              between(Delta_DFS, 3, 14.9) ~ "increased",
                              Delta_DFS >= 15 ~ "increased significantly",
                              TRUE ~ NA_character_)) %>%
  count(DFS_current_status, DFS_diff) %>%
  left_join(df_col) %>%
  mutate(DFS_current_status = factor(DFS_current_status,
                                     levels = rev(c("very high", "high", 
                                                "medium", "low", "very low"))),
         DFS_diff = factor(DFS_diff, 
                           levels = c("declined significantly",
                                      "declined", "maintained",
                                      "increased", "increased significantly")))


# create plot
p <- ggplot(df) +
  geom_point(aes(x = DFS_diff,
                 y = DFS_current_status, 
                 size = n,
                 color = color)) +
  scale_y_discrete(drop = F) +
  scale_x_discrete(drop = F, position = "top") +
  scale_color_identity()

# display plot in plotly
ggplotly(p) %>% 
  layout(xaxis = list(side ="top")) 

Plot Plot 在此处输入图像描述

Edit: Comment - color the grid instead of point编辑:评论 - 为网格着色而不是点

df_col %>% 
  ggplot() +
  geom_raster(aes(x = DFS_diff, 
                  y = rev(DFS_current_status),
                  fill= color)) +
  scale_fill_identity() +
  scale_x_discrete(position = "top") +
  geom_point(data = df, aes(x = DFS_diff,
                 y = DFS_current_status,
                 size = n))

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM