[英]ggplot color, shape and size by factor variables in dataframe over several regions with legend
I have the following dataframe:我有以下 dataframe:
structure(list(PS_position = c(54733745L, 54736536L, 54734312L, 54735312L, 54733745L, 54736536L, 54734312L, 54735312L),
chr_key = c(19L,19L, 19L, 19L, 19L, 19L, 19L, 19L),
hit_count = c(20L, 1L, 5L,15L, 20L, 1L, 5L, 15L),
pconvert = c(0.448, 0.55, 0.8, 0.92, 0.448, 0.55, 0.8, 0.92),
probe_type = c("Non_polymorphic", "preselected", "unvalidated", "validated", "Non_polymorphic", "preselected", "unvalidated", "validated"),
region_name = c("DL1", "DL1", "DL1", "DL1", "DL2", "DL2", "DL2", "DL2"),
start = c(54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L),
stop = c(54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L)),
row.names = c(NA, -8L), class = c("data.table", "data.frame"))
I would like to plot PS_position
in each region_name
on the x-axis colored by probe_type
, shape based on pconvert
categories (0.3 - 0.5, 0.51-0.7, 0.71-0.9, > 0.9) and size of the shape based on hit_count
over all unique region_names
in the dataframe and a legend describing the same.我想 plot
PS_position
在 x 轴上的每个region_name
上用probe_type
着色,形状基于pconvert
类别(0.3 - 0.5、0.51-0.7、0.71-0.9、> 0.9)和形状的大小基于所有唯一的hit_count
dataframe 中的region_names
和描述相同的图例。 xlim
for the plot will be start
/ stop
from the dataframe. xlim
的 xlim 将从 dataframe start
/ stop
。
Of course, the actual values will vary for each unique region_name
.当然,每个唯一
region_name
的实际值会有所不同。 Any ideas on how to best achieve this?关于如何最好地实现这一目标的任何想法? Thanks!
谢谢!
Edit: I had developed something in base R which does not have hitcount
or pconvert
编辑:我在基础 R 中开发了一些没有
hitcount
或pconvert
region = unique(df$region_name)
for(i in seq_along(region))
{
probes = df$PS_position
probe_type = factor(df$probe_type)
df$cols = as.numeric(as.factor(df$probe_type))
legend.cols = as.numeric(as.factor(levels(df$probe_type)))
#should also send the start and stop into PS_position
cols = c("black", "blue", "green", "yellow")
#Use logarithmic scale
par(xpd = T)
plot(1, 1, ylim = c(0.5, length(probes)), xlim = c(min(probes) - 20, max(probes)+10),#, main = paste("Probes ", region, sep = ""),
xlab = "PS_position", bty="n", type = "n", yaxt = "n", ylab = "")
title(region[i], line=0)
begin = min(probes)
end = max(probes)
n = length(probes)
Then I sequentially plot the probes one after another but I don't need that anymore.然后我依次 plot 探测器一个接一个,但我不再需要那个了。 I just want to plot all
PS_position
at once and they should reflect the actual start-stop
and relative position within those bounds.我只想一次 plot 所有
PS_position
并且它们应该反映这些范围内的实际start-stop
和相对 position。 Note above and below base R code is one block.注意上面和下面的基本 R 代码是一个块。 please copy paste together.
请复制粘贴在一起。
for(i in 1:length(probes))
{
lines(x = c(begin, end), y = c(n+1-i, n+1-i), col = "blue", lwd = .8)
xs = probes[1:i]
#cols_i = cols[probe_type[1:i]]
points(x = xs, y = rep(n+1-i, length(xs)), pch = 18, cex = 1.0, col = df$cols)
text(i, x = -50, y = n+1-i, adj = 1.5)
}
add_legend("topright", "Probe_Type", levels(probe_type), fill = legend.cols, horiz=T)
}
dev.off()
Trying to convert this to ggplot2
试图将其转换为
ggplot2
How about this:这个怎么样:
I have taken your data and added the categorical pconvert_cat
variable:我已经获取了您的数据并添加了分类
pconvert_cat
变量:
# comparison of the two variables:
> df[, c(4, 9)]
pconvert pconvert_cat
1 0.448 0.3-0.5
2 0.550 0.51-0.7
3 0.800 0.71-0.9
4 0.920 >0.9
5 0.448 0.3-0.5
6 0.550 0.51-0.7
7 0.800 0.71-0.9
8 0.920 >0.9
I've tried to plot what you wanted from your question using ggplot2
.我已经尝试使用 ggplot2 从您的问题中得到您想要的
ggplot2
。 Essentially, you want to facet by region_name
and then set all the other variables to the given aesthetics ( aes
) you mention in your question.本质上,您希望按
region_name
分面,然后将所有其他变量设置为您在问题中提到的给定美学( aes
)。
ggplot(df, aes(x = PS_position, y = 0,
colour = probe_type, shape = pconvert_cat, size = hit_count)) +
geom_point() +
scale_shape_manual(values = c(3, 15, 16, 17)) +
coord_cartesian(xlim = c(min(df$start), max(df$stop))) +
facet_wrap(~ region_name, nrow = 2) +
theme_minimal() + theme(panel.grid = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
This is what it looks like:这是它的样子:
Which is probably not ideal.这可能并不理想。 I do not know of any
geom_...()
function which would simply graph the 'x difference' between points and not bother with the y-axis.我不知道任何
geom_...()
function 会简单地绘制点之间的“x 差异”而不用打扰 y 轴。 SO community, can we do such a thing? SO社区,我们可以做这样的事情吗? Of course, this depends on whether you want any variables for the y-axis too.
当然,这取决于您是否也需要 y 轴的任何变量。
Assuming you want everything on the same horizontal plane, I have set y to a constant (0).假设您希望所有东西都在同一个水平面上,我将 y 设置为常数 (0)。 Maybe you could set
y = chr_key
, as I notice it is constant (at least in this small data set)?也许您可以设置
y = chr_key
,因为我注意到它是恒定的(至少在这个小数据集中)?
Also, setting xlim = c(min(df$start), max(df$stop)
, means that all your points are quite to the right, as you can see above. Unless you specifically want this, maybe consider dropping the line with coord_cartesian()
:此外,设置
xlim = c(min(df$start), max(df$stop)
意味着您的所有点都在右侧,如您在上面看到的。除非您特别想要这个,否则可以考虑使用coord_cartesian()
:
ggplot(df, aes(x = PS_position, y = 0,
colour = probe_type, shape = pconvert_cat, size = hit_count)) +
geom_point() +
scale_shape_manual(values = c(3, 15, 16, 17)) +
facet_wrap(~ region_name, nrow = 2) +
theme_minimal() + theme(panel.grid = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
To get this:要得到这个:
The differences between the x-values of the points are clearer here.点的 x 值之间的差异在这里更清楚。
Some things to consider:需要考虑的一些事项:
probe_type
and pconvert_cat
values?probe_type
和pconvert_cat
值,会有不止一个观察结果吗? If so, the colour
and shape
aesthetics will come more into play.colour
和shape
美学将发挥更大的作用。 Finally, I strongly agree with Rémi's comment that you should let us know what you've already tried.最后,我非常同意 Rémi 的评论,即您应该让我们知道您已经尝试过什么。 Then I don't have to be guessing quite so much in the answer.
那么我不必在答案中猜测太多。
EDIT编辑
In reply to your comment, using facet_wrap()
does not mean that scales are fixed.在回复您的评论时,使用
facet_wrap()
并不意味着比例是固定的。 You can change the scales
argument to "free_x"
in your case, so that you can have different start
and stop
values for each region_name
.在您的情况下,您可以将
scales
参数更改为"free_x"
,以便您可以为每个region_name
设置不同的start
和stop
值。 For more information about different facet scales look here .有关不同刻面尺度的更多信息,请查看此处。 You might want to use
geom_blank()
as is discussed on that page.您可能希望使用该页面上讨论的
geom_blank()
。 You will have to decide which of the methods listed there works best for your data.您必须决定列出的哪些方法最适合您的数据。 Note than when you add more facets for more
region_name
s, and keep just one column of facets, they should come closer together and the issue of having a y-scale there will become less important as there won't be so much empty space.请注意,当您为更多
region_name
添加更多构面并仅保留一列构面时,它们应该更靠近在一起,并且在那里拥有 y 比例的问题将变得不那么重要,因为不会有那么多空白空间。 (So, for example, you have five different region_name
s and you set nrow = 5
.) (因此,例如,您有五个不同的
region_name
并且您设置nrow = 5
。)
In summary, I think my code, with some of the facet scale changes that you can decide upon, is good to go.总之,我认为我的代码以及您可以决定的一些方面比例更改对 go 来说是好的。
Data数据
df <- structure(list(PS_position = c(54733745L, 54736536L, 54734312L, 54735312L, 54733745L, 54736536L, 54734312L, 54735312L),
chr_key = c(19L,19L, 19L, 19L, 19L, 19L, 19L, 19L),
hit_count = c(20L, 1L, 5L,15L, 20L, 1L, 5L, 15L),
pconvert = c(0.448, 0.55, 0.8, 0.92, 0.448, 0.55, 0.8, 0.92),
probe_type = c("Non_polymorphic", "preselected", "unvalidated", "validated", "Non_polymorphic", "preselected", "unvalidated", "validated"),
region_name = c("DL1", "DL1", "DL1", "DL1", "DL2", "DL2", "DL2", "DL2"),
start = c(54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L),
stop = c(54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L)),
row.names = c(NA, -8L), class = c("data.table", "data.frame"))
df$pconvert_cat <- as.factor(ifelse(df$pconvert >= 0.3 & df$pconvert <= 0.5, "0.3-0.5",
ifelse(df$pconvert > 0.5 & df$pconvert <= 0.7, "0.51-0.7",
ifelse(df$pconvert > 0.7 & df$pconvert <= 0.9, "0.71-0.9", ">0.9"))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.