简体   繁体   English

使用ggplot2向绘图添加点

[英]Adding points to plot using ggplot2

Here is the first 9 rows (out of 54) and the first 8 columns (out of 1003) of my dataset 这是我的数据集的前9行(共54列)和前8列(共1003列)

 stream n rates     means          1         2         3         4
 1   Brooks 3   3.0 0.9629152 0.42707006 1.9353659 1.4333884 1.8566225
 2  Siouxon 3   3.0 0.5831929 0.90503736 0.2838483 0.2838483 1.0023212
 3 Speelyai 3   3.0 0.6199235 0.08554021 0.7359903 0.4841935 0.7359903
 4   Brooks 4   7.5 0.9722707 1.43338843 1.8566225 0.0000000 1.3242210
 5  Siouxon 4   7.5 0.5865031 0.50574543 0.5057454 0.2838483 0.4756304
 6 Speelyai 4   7.5 0.6118634 0.32252396 0.4343109 0.6653132 2.2294652
 7   Brooks 5  10.0 0.9637475 0.88984211 1.8566225 0.7741612 1.3242210
 8  Siouxon 5  10.0 0.5804420 0.47501800 0.7383634 0.5482181 0.6430847
 9 Speelyai 5  10.0 0.5959238 0.15079491 0.2615963 0.4738504 0.0000000

Here is a simple plot I have made using the values found in the means column for all rows with stream name Speelyai (18). 这是我使用在means列中找到的所有具有流名称Speelyai(18)的行的值制作的简单图表。

在此处输入图片说明

The means column is calculated by taking the mean for the entire row. 平均值列是通过取整行的平均值来计算的。 Each column represents 1 simulation. 每列代表1个模拟。 So, the mean column is the mean of 1000 simulations. 因此,平均值列是1000个模拟的平均值。 I would like to plot the actual simulation values on the plot as well. 我也想在图上绘制实际的模拟值。 I think it would be informative to not only have the mean plotted (with a line) but also show the "raw" data (simulations) as points. 我认为,不仅要绘制均值(用一条线),而且还要将“原始”数据(模拟)显示为点将是有益的。 I see that I can use the geom_point() , but am not sure how to get all the points for any row that has the stream name "Speelyai" 我看到可以使用geom_point() ,但是不确定如何获取流名称为“ Speelyai”的任何行的所有点

THANKS 谢谢


在此处输入图片说明

As you can see, the scales are much different, which I would assume, given these points are results from simulations, or resampling the original data. 正如您所看到的,鉴于这些点是模拟或重新采样原始数据的结果,因此我认为这些比例有很大不同。 But How could I overlay these points on my original image in a way that still preserves the visual content? 但是,如何以仍然保留视觉内容的方式在我的原始图像上叠加这些点呢? In this image the line looks almost flat, but in my original image we can see that it fluctuates quite a bit, just on a small scale... 在此图像中,线条看起来几乎是平坦的,但是在我的原始图像中,我们可以看到它在很小的范围内波动很大。

I would suggest reformatting your data in a long format rather than wide. 我建议以长格式而不是宽格式重新格式化您的数据。 For example: 例如:

library("tidyr")
library("ggplot2")
my_data_tidy <- gather(my_data, column, value, -c(stream, n, rates, means))
ggplot(subset(my_data_tidy, stream == "Speelyai"), aes(rates, value)) +
  geom_point() +
  stat_summary(fun.y = "mean", geom = "line")

Note this will also recalculate the means from your data. 请注意,这还将根据您的数据重新计算均值。 If you wanted to use your existing means, you could do: 如果您想使用现有手段,则可以执行以下操作:

ggplot(subset(my_data_tidy, stream == "Speelyai"), aes(rates, value)) +
  geom_point() + geom_line(aes(rates, means), data = subset(my_data, stream == "Speelyai"))

Agree with @NickKennedy that it's a good idea reshaping your data from wide to long: 同意@NickKennedy的观点,将数据从宽到长重塑是一个好主意:

library(reshape)
x2<-melt(x,id=c("stream","n","rates"))
x2<-x2[which(x2$variable!="means"),] # this eliminates the entries for means

Now it's time to re-calculate the means: 现在是时候重新计算均值了:

library(data.table)
setDT(x2)
setkey(x2,"stream")
means.sp<-x2["Speelyai",.(mean.stream=mean(value)),by=rates]

So now you can plot: 所以现在您可以绘制:

library(ggplot2)
p<-ggplot(means.sp,aes(rates,mean.stream))+geom_line()

Which is exactly what you had, so now let's add the points: 正是您所拥有的,所以现在让我们添加几点:

p<-p+geom_point(data=x2[x2$stream=="Speelyai",],aes(rates,value))

Notice that in the call to geom_point you need to specifically declare data= as you are working with a different dataset to the one you specified in the call to ggplot . 请注意,在对geom_point的调用中,您需要特别声明data=因为您正在使用与在ggplot调用中指定的数据集不同的数据集。

========== EDIT TO ADD ============= ==========编辑============

replying to your comments, and borrowing from the answer @akrun gave you here , you'll need to add the calculation of the error and then change the call to geom_point : 回答您的意见,并从答案@akrun借贷给了你这里 ,你将需要添加错误的计算,然后更改呼叫geom_point

df2 <- data.frame(stream=c('Brooks', 'Siouxon', 'Speelyai'), 
      value=c(0.944062036, 0.585852702, 0.583984402), stringsAsFactors=FALSE)
x2$error <- x2$value-df2$value[match(x2$stream, df2$stream)]    

And then change the call to geom_point : 然后将调用更改为geom_point

geom_point(data=x2[x2$stream=="Speelyai",],aes(rates,error))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM