简体   繁体   中英

Subset and plot data by for loop / lappy

I have about 300 sites located over multiple mountains types. I am trying to produce some meaningful plots. Therefore, I would like to subset my data by mountain type (type), and plot it by ggplot2. I would like to automate the process by for loop or by lapply, but I am beginner in both.

I have found some good examples using for loop : http://www.reed.edu/data-at-reed/resources/R/loops_with_ggplot2.html or using lapply: Use for loop in ggplot2 to generate a list

However, both approaches generate empty plots. What am I doing wrong? How can I fix my code?

# Create dummy data
df<- data.frame(loc = rep(c("l1", "l2"), each = 3),
                name = rep(c("A", "B"), 3),
                grid = c(5,6,7,2,3,5),
                area = c(5,10,1,1,3,1),
                areaOrig = rep(c(20, 10, 5), each = 2))

df2<-rbind(df, df)

# Create two mountain types types
df2$type = rep(c("y", "z"), each = 6)

Create function to produce plots:

require(ggplot2)

type.graph <- function(df2, na.rm = TRUE, ...) {

  # Create list of locations
  type_list <-unique(df2$type)

  # Create a for loop to produce ggpot plots
  for (i in seq_along(type_list)) {

    # create a plot for each loc in df
    plot<-

      windows()

      ggplot(subset(df2, df2$type == type_list[i]),
             aes(x = grid, 
                 y = area)) +
        geom_bar(stat = "identity") +
        ggtitle(type_list[i]) +
        facet_grid(loc ~name)

    print(plot)
  }
}

type.graph(df2)

Use lapply to produce plots:

#significant SNPs
type_list <- unique(df2$type)

#create list of ggplots per type
p_re <-
  lapply(type_list, function(i){

    ggplot(subset(df2, type == type_list[i]), 
           aes(x = grid, 
               y = area)) +
      geom_bar(stat = "identity")

  })

#assign names
names(p_re) <- type_list

#plot
p_re$y

I would suggest using a the purrr package as part of the tidyverse, nesting the data frame by the grouping factor, then looping through the subset data. Below is an example:

library(tidyverse)

by_type <- df2 %>% 
  group_by(type) %>% 
  nest() %>% 
  mutate(plot = map(data, 
                    ~ggplot(. ,aes(x = grid, y = area)) +
                      geom_bar(stat = "identity") +
                      ggtitle(.) +
                      facet_grid(loc ~name)))

by_type
# A tibble: 2 x 3
  type  data             plot    
  <chr> <list>           <list>  
1 y     <tibble [6 × 5]> <S3: gg>
2 z     <tibble [6 × 5]> <S3: gg>

The above gives you a normal data frame, but the data and plot columns are list columns. So the first "cell" for data contains all the data for type == y and the second contains all the data for type == z . This basic structure is created by tidyr::nest . You then create a new variable, which I've called plot, by looping through the data list column with purrr::map , and you just need to substitute the data argument for . . Note there are map2 and pmap functions for when you want to loop through more than one thing at a time (for example, if you wanted your title to be something different.

You can then easily look at your data with by_type$plot , or save them with

walk2(by_type$type, by_type$plot, 
      ~ggsave(paste0(.x, ".pdf"), .y))

在此处输入图片说明

Try this:

require(ggplot2)

type.graph <- function(df2, na.rm = TRUE, ...) {

  # Create list of locations
  type_list <-unique(df2$type)

  # Create a for loop to produce ggpot plots
  for (i in seq_along(type_list)) {

    # create a plot for each loc in df
    plot<-
        ggplot(subset(df2, df2$type == type_list[i]),
             aes(x = grid, 
                 y = area)) +
        geom_bar(stat = "identity") +
        ggtitle(type_list[i]) +
        facet_grid(loc ~name)
    windows()
    print(plot)
  }
}

type.graph(df2)

Several years ago, before tidyverse, I had used ggplot2 to produce list of plot objects using similar way you do. At the end of custom function I used to put explicit return() statement to return created object. That worked for me (for example, to run ggsave() later).

Example with custom histogram with df as main dataset to plot followed by some extra parameters:

ggHistFunc <- function (cl, df, ymax, st) {
    mn <- st$means[st$variable==cl]
    P50 <- st$medians[st$variable==cl]
    P10 <- st$P10[st$variable==cl]
    P90 <- st$P90[st$variable==cl]
    gghist <-
        ggplot(data = df, aes_string(x = cl)) +
        geom_histogram(binwidth = diff(range(df[,cl]))/10, aes(y = ..count..),
                       fill = "white", colour = "black") +
        geom_line(data = data.frame(x = c(mn, mn)), y = c(0, ymax),
                  aes(x=x), colour="green", size=1) +
        geom_line(data = data.frame(x = c(P50, P50)), y = c(0, ymax),
                  aes(x=x), colour="brown", size=1) +
        geom_line(data = data.frame(x = c(P10, P10)), y = c(0, ymax),
                  aes(x=x), colour="blue", size=1) +
        geom_line(data = data.frame(x = c(P90, P90)), y = c(0, ymax),
                  aes(x=x), colour="red", size=1)
    #print(gghist)
    return(gghist)
}

And followed by "loop" to create histogram for all parameters:

gg_Hist_HM <- lapply(X = as.list(names(params_HM)),
                     FUN = ggHistFunc, df = params_HM, ymax = 100, st = stat_HM)

Now I see the approach proposed above with purrr package looks more elegant!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM