I'm new to R, but I'm getting dangerous. I want to make a massive gene expression line chart from about 2000 genes that were monitored after drug treatment. My dataframe after loading via csv looks like this
:
head(tmp)
gene_symbol untreated X1hr.avg X3hr.avg X6hr.avg X24hr.avg
1 ERRFI1 0.16612478 -2.0758630 -2.5892085 -2.02039809 -2.4124696
2 ERRFI1 0.27750147 -2.3086333 -3.0538376 -4.01436186 -4.7491462
3 CTDSPL2 0.13172411 -0.7920983 -0.3580963 -0.76213664 -0.8171385
4 CTDSPL2 -0.05205203 -0.9551288 -0.2072265 -0.76993891 -1.0028680
5 SLC26A2 0.20268100 0.5188266 0.5429924 0.01970562 -1.1955852
6 SLC29A4 0.19658238 -0.8102461 -0.9019243 -1.50714838 -1.4648872
I would like to transform this dataframe into something like this:
gene_symbol ratio treatment
ERRFI1 0.16612478 untreated
ERRFI1 -2.0758630 X1hr.avg
ERRFI1 -2.5892085 X3hr.avg
ERRFI1 -2.02039809 X6hr.avg
ERRFI1 -2.4124696 X24hr.avg
etc...
This would allow me to plot via ggplot:
ggplot(data=tmp, aes(x=factor(treatment), y=ratio, group=gene_symbol)) + geom_line() + geom_point()
What you're looking for is the melt() function from the reshape2 library. I used your variable names, but I would suggest storing the melted data into a different variable name.
tmp <- as.data.frame(read.table(text="gene_symbol untreated X1hr.avg X3hr.avg X6hr.avg X24hr.avg
1 ERRFI1 0.16612478 -2.0758630 -2.5892085 -2.02039809 -2.4124696
2 ERRFI1 0.27750147 -2.3086333 -3.0538376 -4.01436186 -4.7491462
3 CTDSPL2 0.13172411 -0.7920983 -0.3580963 -0.76213664 -0.8171385
4 CTDSPL2 -0.05205203 -0.9551288 -0.2072265 -0.76993891 -1.0028680
5 SLC26A2 0.20268100 0.5188266 0.5429924 0.01970562 -1.1955852
6 SLC29A4 0.19658238 -0.8102461 -0.9019243 -1.50714838 -1.4648872", header=TRUE))
library(reshape2)
tmp <- melt(data=tmp, id.vars=c("gene_symbol"))
names(tmp) <- sub("variable", "treatment", names(tmp))
names(tmp) <- sub("value", "ratio", names(tmp))
ggplot(data=tmp, aes(x=factor(treatment), y=ratio, group=gene_symbol)) + geom_line(aes(colour=gene_symbol)) + geom_point()
Not sure if this is a useful way to present this type of data though. you might want to rethink what exactly your goal is.
What you're really doing is "stacking" your variables, so you can also use the ... stack
function.
out <- data.frame(tmp[1], stack(tmp[-1]))
You'll get a warnings, but that is a warning, not an error. It just tells you that the output has new row names.
Here are the first and last few rows of the resulting "stacked" data.frame
:
> head(out)
gene_symbol values ind
1 ERRFI1 0.16612478 untreated
2 ERRFI1 0.27750147 untreated
3 CTDSPL2 0.13172411 untreated
4 CTDSPL2 -0.05205203 untreated
5 SLC26A2 0.20268100 untreated
6 SLC29A4 0.19658238 untreated
> tail(out)
gene_symbol values ind
25 ERRFI1 -2.4124696 X24hr.avg
26 ERRFI1 -4.7491462 X24hr.avg
27 CTDSPL2 -0.8171385 X24hr.avg
28 CTDSPL2 -1.0028680 X24hr.avg
29 SLC26A2 -1.1955852 X24hr.avg
30 SLC29A4 -1.4648872 X24hr.avg
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.