简体   繁体   中英

Plotting distributions of all columns in an R data frame

I'm trying to come up with a clean way to plot a grid view of all the columns in an R data frame. The problem is my dataframe has both discrete and numeric values in it. For simplicity's sake, we can use the sample dataset provided by R called iris . I would use par(mfrow(x, y)) to split my plots and maybe an mapply to cycle through each column? I'm unsure what's best here.

I'm thinking something akin to:

ggplot(iris, aes(Sepal.Length))+geom_density()

But instead plotted for each column. My concern is the "Species" column being discrete. Maybe "geom_density" wouldn't be the right plot to use here, but the idea is to see each of the data frame's variables distributions in one plot-- even the discrete ones. Bar plots for the discrete values would serve the purpose. Basically I'm trying to do the following:

  • Cycle through each column in the data frame
  • If numeric, plot a histogram
  • If discrete (a string basically), plot a bar plot

Any thoughts or advice would be appreciated!

You can use the function plot_grid from the cowplot package. This function takes a list of plots generated by ggplot and created a new plot, cobining them in a grid.

First, create a list of plots with lapply , using geom_density for numeric variables and geom_bar for everything else.

my_plots <- lapply(names(iris), function(var_x){
  p <- 
    ggplot(iris) +
    aes_string(var_x)

  if(is.numeric(iris[[var_x]])) {
    p <- p + geom_density()

  } else {
    p <- p + geom_bar()
  } 

})

Now we simply call plot_grid .

plot_grid(plotlist = my_plots)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM