[英]Barplot to plot frequency of DNA in different sequence lengths
I have this dataframe df
below read as df<- read.table("WT1.txt", header= TRUE)
. 我将此数据df<- read.table("WT1.txt", header= TRUE)
df
读取为df<- read.table("WT1.txt", header= TRUE)
。 I want to plot the histogram labelling ACGT frequency for each length value. 我想绘制每个长度值的直方图标签ACGT频率。 Is there a better way to plot this? 有没有更好的方法来绘制此图?
df
length A C G T
17 95668 73186 162726 730847
18 187013 88641 120631 334695
19 146061 373719 152215 303973
20 249897 73862 115441 343179
21 219899 82356 109536 636704
22 226368 101499 111974 1591106
23 188187 112155 98002 1437280
You could melt the data frame into long format by the variable length
and plot a stacked bar plot with ggplot2
: 您可以通过可变length
将数据帧融为长格式,并使用ggplot2
绘制堆叠的条形图:
df <- read.table(text=
"length A C G T
17 95668 73186 162726 730847
18 187013 88641 120631 334695
19 146061 373719 152215 303973
20 249897 73862 115441 343179
21 219899 82356 109536 636704
22 226368 101499 111974 1591106
23 188187 112155 98002 1437280", header=T)
library(reshape2)
df <- melt(df, id.vars = "length")
library(ggplot2)
ggplot(df)+
geom_bar(aes(x=length, y=value, fill=variable), stat="identity")
Use dplyr
to calculate frequency for each base and ggplot2
to plot bar plot. 使用dplyr
计算每个基准的频率,并使用ggplot2
绘制条形图。 I prefer using stat = "identity", position = "dodge"
instead of only stat = "identity"
as it gives better sense what data looks like. 我更喜欢使用stat = "identity", position = "dodge"
而不是仅使用stat = "identity"
因为它可以更好地理解数据的外观。
library(tidyverse)
gather(df, Base, value, -length) %>%
group_by(length) %>%
mutate(frequency = value / sum(value)) %>%
ggplot(aes(factor(length), y = frequency, fill = Base))+
geom_bar(stat = "identity", position = "dodge",
color = "black", width = 0.6) +
labs(x = "Base pairs",
y = "Frequency",
fill = "Base") +
scale_y_continuous(limits = c(0, 1)) +
scale_fill_brewer(palette = "Set1") +
theme_classic()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.