简体   繁体   English

如何使用箱线图在 R 中绘制 csv 数据列

[英]How to plot columns of csv data in R using boxplots

I have a sample dataframe that is 3600 rows long by 6 columns wide.我有一个 3600 行长 x 6 列宽的示例数据框。 I want to create plot in R that will show six boxplots, one for each of the 6 columns of data.我想在 R 中创建图,该图将显示六个箱线图,6 列数据中的每一列都有一个箱线图。 I am using ggplot.我正在使用 ggplot。 I can create them in excel easy enough (shown below) but want to be able to do it in R as my future dataframes are going to be much larger and R seems to handle large datasets a lot easier.我可以很容易地在 excel 中创建它们(如下所示),但希望能够在 R 中完成,因为我未来的数据框会更大,而且 R 似乎更容易处理大型数据集。

excel绘图

Using the code below I can plot the first column fine, but can't figure out how to add the data from the other 5 columns.使用下面的代码,我可以很好地绘制第一列,但无法弄清楚如何添加其他 5 列的数据。

ggplot(data=df)+
 geom_boxplot(aes(x="Label", y=col1))

Using geom_boxplot from ggplot2使用geom_boxplotggplot2

To get a boxplot for each of your 6 columns with ggplot2 , you need to reshape first your dataframe into a longer format in order to match the grammar of ggplot2 (one column for x values, one column for y values and one or more column as categorical values).要使用ggplot2为 6 列中的每一列获取箱线图,您需要首先将数据帧重塑为更长的格式以匹配ggplot2的语法(一列用于 x 值,一列用于 y 值,一列或多列作为分类值)。 Then, you can use ggplot2 and geom_boxplot function:然后,您可以使用ggplot2geom_boxplot函数:

Here, an example using the included iris dataset:这里是使用包含的iris数据集的示例:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Using, pivot_longer function from tidyr package you can reshape the first 4 columns of this dataset into a longer format:使用tidyr包中的pivot_longer函数,您可以将此数据集的前 4 列重塑为更长的格式:

library(tidyr)
library(dplyr)
iris2 <- iris %>% pivot_longer(cols = Sepal.Length:Petal.Width, names_to = 
"Var", values_to = "val")

# A tibble: 600 x 3
   Species Var            val
   <fct>   <chr>        <dbl>
 1 setosa  Sepal.Length   5.1
 2 setosa  Sepal.Width    3.5
 3 setosa  Petal.Length   1.4
 4 setosa  Petal.Width    0.2
 5 setosa  Sepal.Length   4.9
 6 setosa  Sepal.Width    3  
 7 setosa  Petal.Length   1.4
 8 setosa  Petal.Width    0.2
 9 setosa  Sepal.Length   4.7
10 setosa  Sepal.Width    3.2
# … with 590 more rows

And then, you can use this new dataset in ggplot2 for getting boxplot for each of values of Var :然后,您可以在ggplot2使用这个新数据集来获取每个Var值的箱线图:

library(ggplot2)
ggplot(iris2, aes(x = Var, y = val, fill  = Var))+
  geom_boxplot()

在此处输入图片说明


Alternative using base r使用base r替代方法

Without the need to reshape your dataframe, you can get the boxplot right away by using boxplot function in base r :无需重塑数据框,您可以通过在base r使用boxplot函数立即获得 boxplot :

boxplot(iris[,c(1:4)], col = c("red","green","blue","orange"))

在此处输入图片说明

Does it answer your question ?它回答你的问题吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM