简体   繁体   English

ggplot2散点图和标签

[英]ggplot2 scatterplot and labels

I am attempting produce a scatter plot using the ggplot2 library. 我正在尝试使用ggplot2库生成散点图。 My data frame (called scatterPlotData) is in this form: 我的数据框(称为scatterPlotData)采用以下形式:

115 2.3
120 1.6
.
.
.
132 4.3

(The ... signifies many other similar values). (...表示许多其他相似的值)。 Essentially, a 2 column data frame. 本质上是2列数据帧。 I also have labels to go along with each of those points. 我也有标签要与这些点中的每一个一起使用。 Firstly, I'm having trouble with the scatterplot itself. 首先,我在散点图本身上遇到了麻烦。 I'm using the following code: 我正在使用以下代码:

p <- ggplot(scatterPlotData, aes("Distance (bp)", "Intensity"))
p + geom_point()

However, using the above code, I get the following plot: 但是,使用上面的代码,我得到以下图:

在此处输入图片说明

Obviously, it's not a scatter plot. 显然,这不是散点图。 So, I'd be very helpful if someone could point out what I'm doing wrong. 因此,如果有人可以指出我做错了什么,我将非常有帮助。

Secondly, it's about the labels. 其次,关于标签。 I will have many datapoints which would have the risk of overlapping datapoints. 我将有许多数据点,这些数据点有重叠数据点的风险。 How should I go about just putting on labels to each point using ggplot? 我应该如何使用ggplot将标签贴在每个点上? Also, it states that I could use the directlabels package to get a good overlap free labelled scatterplot using different colors, however, I'm not sure how I would go about that with ggplot as I haven't found any documentations regarding the use of directlabels with ggplot . 另外,它指出我可以使用directlabels包使用不同的颜色来获得良好的无重叠标记散点图,但是,我不确定如何使用ggplot来解决该问题,因为我没有找到任何有关使用ggplot文档ggplotggplot

Any help with either (or both) question(s) are greatly appreciated - thanks. 非常感谢您对一个或两个问题的任何帮助-谢谢。

Lose the inverted commas, at the moment you're making a plot of the text value... Having looked again, you will have problems with the brackets in your variable name ( Distance (bp) ). 现在,在绘制文本值时,请丢失逗号的反斜线。再次查看后,变量名( Distance (bp) )中的方括号会出现问题。 Change that to something without the brackets, then make the ggplot call without the inverted commas: 将其更改为不带方括号的内容,然后进行ggplot调用,且不带反逗号:

#Assuming Distance (bp) is the first column
names(scatterPlotData)[1] <- "Distance"
p <- ggplot(scatterPlotData, aes(Distance, Intensity) + geom_point()

As for non-overlapping labels, this is a vexed issue with lots of discussion on SO - I think you'll not get great responses from such a vague question here. 对于不重叠的标签,这是一个令人烦恼的问题,关于SO的讨论很多-我认为您在这里不会从这样一个模糊的问题中得到很好的答复。

First, it would be much more helpful if you provided a reproducible example the precisely described your data. 首先,如果您提供一个精确描述您的数据的可复制示例,那将大有帮助。

You should not be passing variable names in aes in quotes. 您不应在引号中将变量名称传递给aes I'm not sure where you got that from, there wouldn't be a single example of anyone doing that that I can think of (unless they were using aes_string which is specifically for that case). 我不确定您从哪里得到的信息,不会有任何一个我能想到的示例(除非他们使用的是专门用于该情况的aes_string )。

However , it appears that you have an awkward variable name, ie Distance (bp) . 但是 ,您似乎有一个笨拙的变量名,即Distance (bp) This is non-standard and not recommended. 这是非标准的,不建议使用。 Names should not have spaces in them. 名称中不能包含空格。 The best thing to do would be to rename that column to something sensible and then do something like: 最好的办法是将该列重命名为明智的名称,然后执行以下操作:

p <- ggplot(scatterPlotData, aes(x = Distance_bp,y = Intensity))
p + geom_point()

If you do not rename the column, something like this might work: 如果不重命名列,则可能会执行以下操作

p <- ggplot(scatterPlotData, aes(x = `Distance (bp)`,y = Intensity))
p + geom_point()

Note that those are backticks, not single quotes. 请注意,这些是反引号,而不是单引号。

As for the overlapping data, I would recommend reading here and here . 至于重叠的数据,我建议在这里这里阅读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM