简体   繁体   中英

Scatter plot of variables in a column of a tibble using dplyr & ggplot2

I have the following tibble that would like to use to make a scatter plot (using ggplot2) of logcpm values of AA_Colon vs. BA_Colon matched by gene.

             gene   sample     logcpm
             <chr>    <chr>      <dbl>
 1 ENSG00000169903 AA_Colon 0.31536340
 2 ENSG00000145321 AA_Colon 0.19735593
 3 ENSG00000171560 AA_Colon 0.00000000
 4 ENSG00000171557 AA_Colon 0.19735593
 5 ENSG00000106327 AA_Colon 0.06882901
 6 ENSG00000228278 AA_Colon 0.13452328
 7 ENSG00000138115 AA_Colon 0.31536340
 8 ENSG00000148702 AA_Colon 0.00000000
 9 ENSG00000140107 AA_Colon 0.00000000
10 ENSG00000197723 AA_Colon 0.00000000
11 ENSG00000169903 BA_Colon 1.14724849
12 ENSG00000145321 BA_Colon 0.08113901
13 ENSG00000171560 BA_Colon 0.36654820
14 ENSG00000171557 BA_Colon 0.23088996
15 ENSG00000106327 BA_Colon 0.08113901
16 ENSG00000228278 BA_Colon 0.08113901
17 ENSG00000138115 BA_Colon 0.42987550
18 ENSG00000148702 BA_Colon 0.00000000
19 ENSG00000140107 BA_Colon 0.00000000
20 ENSG00000197723 BA_Colon 0.08113901

Currently, I'm doing (which works):

tibble %>%
    spread(key = sample, value = logcpm) %>%
    ggplot(aes(x = AA_Colon, y = BA_Colon)) +
    geom_point()

But, I'm wondering if there's a more elegant way to directly work with the tidy format and extract the two vectors to plot instead of spreading the data into two columns.

When the data are in tidy format, ggplot gives scatterplot with just two columns of dots, one for AA_Colon and one for BA_Colon.

ggplot(tibble, aes(x = sample, y = logcpm)) + 
        geom_point()

Maybe boxplots with geom_jitter would be more useful.

ggplot(tibble, aes(x = sample, y = logcpm)) +
geom_boxplot() + 
geom_jitter(width = 0.3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM