I would like to make a relatively simple plot (reminiscent of timelines such as this: http://www.ats.ucla.edu/stat/sas/code/timeline.gif ), but instead of time on the x-axis, it will be base positions in a genome. The "time spans" will be coverage distances for DNA-sequence scaffolds, showing the spans of where they fall in the genome, where they overlap and places with no coverage. Here is a crude mock-up of what I am looking for, showing contig coverage of rRNAs,(I left out, but need, an x-axis showing positions the starts and stops, and labeling of the contigs (colored lines)): http://i.imgur.com/MDABx.png , with the following coordinates:
Contig# Start1 Stop1 Start2 Stop2 Start3 Stop3 Start4 Stop4
1 1 90 90 100 120 150 200 400
2 1 100 120 150 200 400 NA NA
3 1 30 90 100 120 135 200 400
4 1 100 120 140 200 400 NA NA
5 -35 80 90 100 130 150 200 400
6 1 100 200 300 360 400 NA NA
I am pretty sure this can be done in R, probably with ggplot2, but for some reason I cannot figure it out.
This is not going to be as organized as your plot but it puts the lines in with coordinates that you have yet to provide:
dfr <- data.frame(seg=sample(1:6, 20, replace=TRUE), start=sample(1:100, 20, replace=TRUE), end=sample(1:100,20, replace=TRUE) )
plot(c(1,100), c(1,6), type="n")
with(dfr, segments(y0=seg, y1=seg, x0=start, x1=end, col=2:7, lwd=3))
With new dataset:
Contig <- read.table(text=" Start1 Stop1 Start2 Stop2 Start3 Stop3 Start4 Stop4
1 1 90 90 100 120 150 200 400
2 1 100 120 150 200 400 NA NA
3 1 30 90 100 120 135 200 400
4 1 100 120 140 200 400 NA NA
5 -35 80 90 100 130 150 200 400
6 1 100 200 300 360 400 NA NA")
# the reshape function can be tricky.... but seems to finally work.
reshape(Contig, direction="long", sep="",
varying=list(Start=names(Contig)[c(1,3,5,7)],
Stop=names(Contig)[c(2,4,6,8)] ) )
#------------------------------
time Start1 Stop1 id
1.1 1 1 90 1
2.1 1 1 100 2
3.1 1 1 30 3
4.1 1 1 100 4
5.1 1 -35 80 5
6.1 1 1 100 6
1.2 2 90 100 1
2.2 2 120 150 2
3.2 2 90 100 3
4.2 2 120 140 4
5.2 2 90 100 5
6.2 2 200 300 6
1.3 3 120 150 1
2.3 3 200 400 2
3.3 3 120 135 3
4.3 3 200 400 4
5.3 3 130 150 5
6.3 3 360 400 6
1.4 4 200 400 1
2.4 4 NA NA 2
3.4 4 200 400 3
4.4 4 NA NA 4
5.4 4 200 400 5
6.4 4 NA NA 6
#-----------------
LContig <- reshape(Contig, direction="long", sep="",
varying=list(Start=names(Contig)[c(1,3,5,7)], Stop=names(Contig)[c(2,4,6,8)] ) )
plot(range(c(Contig$Start1, Contig$Stop1) , na.rm=TRUE ), c(1,6),
type="n", xlab="Segments", ylab="Groups")
with(LContig, segments(y0=id, y1=id, x0=Start1, x1=Stop1, col=2:7, lwd=3))
Here's a version using ggplot2
:
# Never forget
options(stringsAsFactors = FALSE)
# Load ggplot2 and reshape2
library(ggplot2)
library(reshape2)
# Read in the data
contig <- read.table(
text = "id Start1 Stop1 Start2 Stop2 Start3 Stop3 Start4 Stop4
1 1 90 90 100 120 150 200 400
2 1 100 120 150 200 400 NA NA
3 1 30 90 100 120 135 200 400
4 1 100 120 140 200 400 NA NA
5 -35 80 90 100 130 150 200 400
6 1 100 200 300 360 400 NA NA",
header = TRUE
)
# Reshape it
# Melt it all the way down - each data value is gets a record
# identified by id and variable name
contig.melt <- melt(contig, id.var = "id")
# Your variable names contain two pieces of information:
# whether this point is a start or a stop, and
# which span this point is associated with.
# Much easier to work with those separately, so I'll parse them
# into variables.
# Which span?
contig.melt$span <- gsub(x = contig.melt$variable,
pattern = ".*(\\d)",
replace = "\\1")
# Start or stop?
contig.melt$point <- gsub(x = contig.melt$variable,
pattern = "(.*)\\d",
replace = "\\1")
# Cast it back into a dataset with a record for each span
contig.long <- dcast(contig.melt, id + span ~ point)
# Plot it. The vertical position and line colors are determined by
# the ID. I'm calling that x here, but I'll flip the coords later
ggplot(contig.long, aes(x = id, color = factor(id))) +
# geom_linerange plots a line from Start (ymin) to stop (ymax)
# Control the width of the plot with size
geom_linerange(aes(ymin = Start, ymax = Stop), size = 2) +
# Flip the coordinates
coord_flip() +
# Make it pretty
scale_colour_brewer("RNA ID", palette = "Dark2") +
labs(x = "RNA ID", y = "Position") +
theme_bw()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.