简体   繁体   English

使用 ggplot 包制作箱线图时 aes() 值是多少?

[英]What are the aes() values when making a boxplot using the ggplot package?

I'm trying to make a boxplot with the ggplot2 package in r studo.我正在尝试使用 r studio 中的ggplot2包制作箱线图。 I've been reading around on past ggplot2 questions but this is just so basic I can't find it covered in detail... I'm bad at using r.我一直在阅读过去的 ggplot2 问题,但这太基本了,我找不到详细介绍......我不擅长使用 r。

This is my very basic code that I'm trying to use but I don't know my x and y values?这是我尝试使用的非常基本的代码,但我不知道我的 x 和 y 值?

ggplot(data, aes(x,y)) + geom_boxplot()

So, my y values are Pearson Coefficents which is either 0-1 but I'm struggling to put that in as a range.所以,我的 y 值是 Pearson Coefficents,它要么是 0-1,但我正在努力把它放在一个范围内。 Then I'm just confused because my x values are just 4 different conditions.然后我很困惑,因为我的 x 值只是 4 个不同的条件。 Should I use a vector?我应该使用向量吗? eg c(drug 6hr, control, drug 24hr, control)例如c(drug 6hr, control, drug 24hr, control)

I succesfully made a basic boxplot using boxplot() but I am using ggplot2 because I want to show every individual value on the plot using jitter which I have also failed to use.我成功地使用ggplot2 boxplot()制作了一个基本的箱线boxplot()但我使用的是ggplot2因为我想使用我也未能使用的jitter来显示图中的每个单独的值。

Sorry I have only been using R for about 6 months!抱歉,我只使用 R 大约 6 个月! Trying to learn as much as I can.尽可能多地学习。

My data:我的数据:

drug 6hr, control, drug 24hr, control
0.876   0.707   0.709   0.521
0.084   0.275   0.468   0.795
0.911   0.985   0.565   0.150
0.503   0.584   0.693   0.766
0.363   0.102   0.775   0.640
0.219   0.888   0.724   0.516
0.041   0.277   0.877   0.216
0.206   0.974   0.771   0.434
0.787   0.725   0.671   0.916
0.896   0.873   0.443   0.693
0.396   0.641   0.525   0.471
0.250   0.184   0.467   0.537
0.094   0.453   0.641   0.910
0.750   0.748   0.634   0.007
0.026   0.263   0.069   0.725
0.109           0.227   0.535
0.780           0.811   0.241
0.710           0.568   0.029
0.676           0.114   0.237
0.610           0.260   0.241
0.170           0.728   0.405
0.025           0.815   0.914
0.022           0.329   0.766
0.039           0.714
0.034           0.096
0.402           0.988
0.649
0.564
0.190
0.844
0.920
0.744
0.871
0.565

You need to reshape your dataframe into a longer format and then it will makes things easier forg etting your boxplot with ggplot2 .您需要将您的数据ggplot2重塑为更长的格式,然后它会让事情变得更容易ggplot2使用ggplot2 boxplot

Here, I'm using pivot_longer function from tidyr package to transform your data into two columns with the first one being the name of the condition and the second one contains values:在这里,我使用pivot_longer功能从tidyr包到您的数据与第一个是条件的名字,第二个包含的值转换成两列:

library(tidyr)
library(dplyr)
DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") 

# A tibble: 136 x 2
   var        values
   <chr>       <dbl>
 1 drug_6hr    0.876
 2 Control_6   0.707
 3 drug_24hr   0.709
 4 Control_24  0.521
 5 drug_6hr    0.084
 6 Control_6   0.275
 7 drug_24hr   0.468
 8 Control_24  0.795
 9 drug_6hr    0.911
10 Control_6   0.985
# … with 126 more rows

Then, you can add the graphic part to the pipe (symbol %>%) sequence by defining your dataframe into ggplot with various aes arguments and use geom_boxplot and geom_jitter functions:然后,您可以通过使用各种aes参数将数据帧定义为ggplot并使用geom_boxplotgeom_jitter函数,将图形部分添加到管道(符号 %>%)序列中:

library(tidyr)
library(dplyr)
library(ggplot2)
DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") %>%
  ggplot(aes(x = var, y = values, fill = var, color = var))+
  geom_boxplot(alpha = 0.2)+
  geom_jitter()

Alternatively, to remove the warning messages based on the presence of NA values, you can filter out NA values by adding a filter function between the pivot_longer and ggplot :或者,要根据NA值的存在删除警告消息,您可以通过在pivot_longerggplot之间添加filter函数来过滤掉NA值:

DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") %>%
  filter(!is.na(values)) %>%
  ggplot(aes(x = var, y = values, fill = var, color = var))+
  geom_boxplot(alpha = 0.2)+
  geom_jitter()

在此处输入图片说明

Does it answer your question ?它回答你的问题吗?

Reproducible example可重现的例子

I edited your example in order to make it better for reading into R. I also modify colnames as pointed out by @akrun:我编辑了你的例子,以便更好地读入 R。我还修改了@akrun 指出的列名:

structure(list(drug_6hr = c(0.876, 0.084, 0.911, 0.503, 0.363, 
0.219, 0.041, 0.206, 0.787, 0.896, 0.396, 0.25, 0.094, 0.75, 
0.026, 0.109, 0.78, 0.71, 0.676, 0.61, 0.17, 0.025, 0.022, 0.039, 
0.034, 0.402, 0.649, 0.564, 0.19, 0.844, 0.92, 0.744, 0.871, 
0.565), Control_6 = c(0.707, 0.275, 0.985, 0.584, 0.102, 0.888, 
0.277, 0.974, 0.725, 0.873, 0.641, 0.184, 0.453, 0.748, 0.263, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), drug_24hr = c(0.709, 0.468, 0.565, 0.693, 0.775, 
0.724, 0.877, 0.771, 0.671, 0.443, 0.525, 0.467, 0.641, 0.634, 
0.069, 0.227, 0.811, 0.568, 0.114, 0.26, 0.728, 0.815, 0.329, 
0.714, 0.096, 0.988, NA, NA, NA, NA, NA, NA, NA, NA), Control_24 = c(0.521, 
0.795, 0.15, 0.766, 0.64, 0.516, 0.216, 0.434, 0.916, 0.693, 
0.471, 0.537, 0.91, 0.007, 0.725, 0.535, 0.241, 0.029, 0.237, 
0.241, 0.405, 0.914, 0.766, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA)), row.names = c(NA, -34L), class = c("data.table", "data.frame"
))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM