简体   繁体   中英

R: combine mpg trans columns into new dataframe containing two columns

I am working my way through the R for Data Science Manual , currently finishing chapter 3. I am trying to find a way to produce a plot combining the different types of automatic and manual transmission into two plots, instead of what I have currently:

# Install necessary packages
install.packages("tidyverse")
library(tidyverse)

# Create the plot
fuelbytrans <- ggplot(data = mpg) +
  geom_jitter(
    mapping = aes(x = displ, y = hwy, colour = fl),
    size = 0.75) +
  # Change labels for title and x and y axes
  labs(
    title = "Drivstofforbruk iht. datasettet «mpg» fordelt på girkasse og motorvolum",
    x = "Motorvolum",
    y = "Am. mil per gallon")

# Run it
fuelbytrans



# Set colours and labels for fuel legend and position it on the bottom
# e (etanol), d (diesel), r (regular [bensin, lavoktan]), p (premium [bensin, høyoktan]),
# c (CNG)
cols <- c( #kilde: http://colorbrewer2.org/#type=diverging&scheme=PRGn&n=5
  "c" = "yellow",
  "d" = "red",
  "e" = "black",
  "p" = "blue",
  "r" = "darkgreen"
)
labels_fuel <- fuelbytrans +
scale_colour_manual(
    name = "Drivstoff",
    values = cols,
    breaks = c("c", "d", "e", "p", "r"),
    labels = c("CNG",
               "diesel",
               "etanol",
               "bensin,\nhøyoktan",
               "bensin,\nlavoktan")) +
  theme(legend.position = "bottom",
        legend.background = element_rect(
          fill = "gray90",
          size = 2,
          linetype = "dotted"
        ))

# Run it
labels_fuel



# Wrap by transmission type
labels_fuel + facet_wrap(~ trans, nrow = 1)

As you can see, what I get is 8 columns for automatic transmission, and two for manual; what I would like is just two columns, one for automatic and one for manual, concatenating the plots. I have presently no idea how to do this, and would appreciate all help.

If any information is missing, should have been written differently, or could otherwise be improved, please advise.

I am running RStudio 0.99.902. I am quite new to R.

You have more than 2 types of transmission in your data:

table(mpg$trans)

# auto(av)   auto(l3)   auto(l4)   auto(l5)   auto(l6) 
#        5          2         83         39          6 
# auto(s4)   auto(s5)   auto(s6) manual(m5) manual(m6) 
#        3          3         16         58         19

You need to group them into 2 groups first, here is one option:

mpg = mpg %>% 
  mutate(trans2 = if_else(grepl("auto", trans), "auto", "manual"))

table(mpg$trans2)

# auto manual 
# 157     77

Then, use the new trans2 variable for facetting (you need to rerun the plot).

Two more comments:

  1. If you want to know more about an R function, call ?function_name in R. This will bring up the help page for that function. It usually includes examples that you can run from R to see what it does in action. (Plus here we are using grepl , so it would also be useful to Google the term "regular expressions", if you are not familiar with them).

  2. Since you are reading r4ds, you need to get familiar with the "pipe operator" used in dplyr , tidyr and other tidyverse packages sooner rather than later. It can chain multiple function calls together in an easily readable way. Google it or take a look here . The call could also be written without the pipe like this:

     mpg = mutate(mpg, trans2 = if_else(grepl("auto", trans), "auto", "manual")) 

In this particular case, the pipe operator is actually not that useful. I am just so used to it I went for it automatically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM