Scatter plot R for multiple values

Question

I have troubles plotting a scatter plot for my data. I have 1 independent variable "Strain" for which I have 3 explanatory values. See structure dataframe

'data.frame':   30 obs. of  4 variables:
 $ Strain       : Factor w/ 30 levels "1","10","11",..: 1 12 14 15 25 27 28 29 30 2 ...
 $ second_hour  : Factor w/ 30 levels "10356.3888888889",..: 15 16 8 14 7 6 11 10 13 12 ...
 $ second_hour_n: Factor w/ 30 levels "10149.4751953184",..: 5 4 15 6 18 19 13 14 9 12 ...
 $ Beula        : num  21674 21308 19905 20817 20017 ...

> head(hour_2)
  Strain      second_hour    second_hour_n    Beula
1      1 19354.4444444444 12103.3628274451 21673.72
2      2 20021.2222222222 11577.7991047524 21307.61
3      3 16105.9444444444 14425.8808435683 19905.39
4      4 18993.3888888889 12149.3204615723 20816.78
5      5 15541.3888888889 15370.8433645383 20016.94
6      6 14767.1666666667 16288.3635541566 19000.44

I would like to plot in a scatterplot each explanatory value for each strain colored coded.

In my current attempt I first melt the dataframe using the following code:

> hour_2_melted <- melt(hour_2, id.vars = "Strain")
Warning message:
attributes are not identical across measure variables; they will be dropped

Then I plot

ggplot(hour_2_melted, aes(Strain, value)) + geom_point()

However the Y axis cannot be changed because its continuous, I do not want each value to be shown on the y axis. Also the x axis is in a strange order. Lastly, how do I color code the 3 different explanatory values?

Any help is appreciated.

Answer 1

You can use tidyr package and the function pivot_longer to reshape your data for ggplot2 :

library(tidyr)
library(dplyr)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value")

# A tibble: 18 x 3
   Strain Variable       Value
    <int> <chr>          <dbl>
 1      1 second_hour   19354.
 2      1 second_hour_n 12103.
 3      1 Beula         21674.
 4      2 second_hour   20021.
 5      2 second_hour_n 11578.
 6      2 Beula         21308.
 7      3 second_hour   16106.
 8      3 second_hour_n 14426.
 9      3 Beula         19905.
10      4 second_hour   18993.
11      4 second_hour_n 12149.
12      4 Beula         20817.
13      5 second_hour   15541.
14      5 second_hour_n 15371.
15      5 Beula         20017.
16      6 second_hour   14767.
17      6 second_hour_n 16288.
18      6 Beula         19000.

And then for plotting, you can pass it as a sequence of pipes

library(tidyr)
library(dplyr)
library(ggplot2)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value") %>%
  ggplot(aes(x = Strain, y = Value, color = Variable))+
  geom_point()

Regarding your issue with the order of the x axis, using the code of my answer and the reproducible example I provided (see below), I can't reproduce your issue (even if I transform Strain in factor levels before reshaping the dataframe):

library(tidyr)
library(dplyr)
library(ggplot2)
df$Strain <- as.factor(df$Strain)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value") %>%
  ggplot(aes(x = Strain, y = Value, color = Variable))+
  geom_point()

However, based on your dataframe, I would recommend to change your factor levels in numeric values by doing:

hour_2$Strain <- as.numeric(as.vector(hour_2$Strain))
hour_2$second_hour <- as.numeric(as.vector(hour_2$second_hour))
hour_2$second_hour_n <- as.numeric(as.vector(hour_2$second_hour_n))

Does it answer your question ?

Data

structure(list(Strain = 1:6, second_hour = c(19354.4444444444, 
20021.2222222222, 16105.9444444444, 18993.3888888889, 15541.3888888889, 
14767.1666666667), second_hour_n = c(12103.3628274451, 11577.7991047524, 
14425.8808435683, 12149.3204615723, 15370.8433645383, 16288.3635541566
), Beula = c(21673.72, 21307.61, 19905.39, 20816.78, 20016.94, 
19000.44)), class = "data.frame", row.names = c(NA, -6L))

Data 2

structure(list(Strain = c(1L, 2L, 21L, 44L, 5L, 6L), second_hour = c(19354.4444444444, 
20021.2222222222, 16105.9444444444, 18993.3888888889, 15541.3888888889, 
14767.1666666667), second_hour_n = c(12103.3628274451, 11577.7991047524, 
14425.8808435683, 12149.3204615723, 15370.8433645383, 16288.3635541566
), Beula = c(21673.72, 21307.61, 19905.39, 20816.78, 20016.94, 
19000.44)), class = "data.frame", row.names = c(NA, -6L))

Scatter plot R for multiple values

Question

1 answers

solution1
4 2020-01-08 00:50:26

Scatter plot R for multiple values

Question

1 answers

solution1 4 2020-01-08 00:50:26

solution1
4 2020-01-08 00:50:26