简体   繁体   English

带有数字和分类变量的框 plot

[英]Box plot with numeric and categorical variables

I want to create a box plot to visualize the distribution of multiple numerical variables with the same scale against one categorical variable in order to see the behaviour between the different measures for one specific level of the factor.我想创建一个框 plot 来可视化具有相同比例的多个数值变量相对于一个分类变量的分布,以便查看针对某个特定水平的因子的不同度量之间的行为。

For example, I want to see how much differs the quantity (in thousands of $) of the shipments that 3 custumers order by the type of product.例如,我想查看 3 位客户订购的货物数量(以千美元计)与产品类型的差异有多大。 Take this example data:以这个示例数据为例:

prueba <- data.frame("client1" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 6.5, sd = 1),
                     "client2" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 6.9, sd = 2),
                     "client3" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 5, sd = 3),
                     "type" = as.factor(sample(LETTERS[1:3], 60, replace = T, prob = c(0.4,0.35,0.25))),
                     "cat" = as.factor(sample(LETTERS[20:22], 60, replace = T, prob = c(0.5, 0.1,0.4))))
prueba[,1:3] <- round(prueba[,1:3], 1)
head(prueba)
#  client1 client2 client3 type cat
#1     6.3     7.2     7.0    B   T
#2     7.2     6.5     3.5    C   T
#3     8.0     6.4     8.0    A   V
#4     8.0     7.4     7.0    A   V
#5     7.5     7.6     2.5    B   V
#6     7.0     9.0     3.7    A   V

With ggplot I can do this:使用 ggplot 我可以做到这一点:

library(tidyverse)
library(patchwork)

uno <- prueba %>% ggplot(aes(x = type, 
                      y = client1)) +
        geom_boxplot()+scale_y_continuous(limits = c(0,10))

dos <- prueba %>% ggplot(aes(x = type, 
                             y = client2)) +
        geom_boxplot()

tres <- prueba %>% ggplot(aes(x = type, 
                              y = client3)) +
        geom_boxplot()

uno+dos+tres+plot_layout(byrow = F)

I get this: Differences in distributions:我明白了: 分布差异:
分布差异

However, I want something like this: Something like this:但是,我想要这样的东西: 像这样的东西:
像这样的东西

But instead of that the x axis be filled with other categorie, I want that it be fill with the distribution of each client.但不是 x 轴填充其他类别,我希望它填充每个客户端的分布。

  1. Is this possible?这可能吗?

  2. How can I do this in R?如何在 R 中执行此操作?

  3. There are other visualization methods for do the same?还有其他可视化方法可以做到这一点吗?

Are you looking for this something like this?你在寻找这样的东西吗?

prueba2 <- prueba %>% 
  pivot_longer(cols = starts_with("client"), names_to = "client")

  ggplot(data = prueba2, aes(x = type, 
                             y = value, 
                             fill = client)) +
  geom_boxplot() 

在此处输入图像描述

If so, first get all the client# columns into one column "client" with the corresponding values into another column "value" with pivot_longer (from the package tidyr, already in tidyverse).如果是这样,首先将所有 client# 列放入一个“client”列,并将相应的值放入另一个带有 pivot_longer 的列“value”(来自 package tidyr,已经在 tidyverse 中)。 Then let ggplot do the rest - All we have to tell it is: x-axis is 'type', y-axis is 'value', and 'client' is the fill color.然后让 ggplot 执行 rest - 我们只需要告诉它:x 轴是“类型”,y 轴是“值”,“客户端”是填充颜色。

I am not sure if I understand you correctly but if you want each level of client instead of each level of cat then you have to convert everything to long format:我不确定我是否正确理解您,但如果您想要每个级别的客户端而不是每个级别的 cat,那么您必须将所有内容转换为长格式:

prueba <- data.frame("client1" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 6.5, sd = 1),
                     "client2" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 6.9, sd = 2),
                     "client3" = truncnorm::rtruncnorm(n = 60, a = 1, b = 9.8, mean = 5, sd = 3),
                     "type" = as.factor(sample(LETTERS[1:3], 60, replace = T, prob = c(0.4,0.35,0.25))),
                     "cat" = as.factor(sample(LETTERS[20:22], 60, replace = T, prob = c(0.5, 0.1,0.4))))
prueba[,1:3] <- round(prueba[,1:3], 1)

library(reshape2)

prueba_long <- melt(prueba,  id.vars = c('type', 'cat'))

ggplot(prueba_long, aes(x = type, y = value, colour = variable)) +
  geom_boxplot()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM