[英]How to make scatterplot with two categorical variables on x-axis in R
I am trying to make a scatter-plot in R with two categorical variables on the x-axis. 我试图在R中制作一个散点图,在x轴上有两个分类变量。 For a boxplot I know how to do this (see first part of code below), but somehow I cannot get it to work for a scatterplot.
对于一个箱形图,我知道如何做到这一点(参见下面的代码的第一部分),但不知怎的,我不能让它为散点图工作。 I have tried several things, but when I plot points they always overlap and don't show my second categorical variable anymore.
我尝试了几件事,但是当我绘制点时,它们总是重叠并且不再显示我的第二个分类变量。 Jitter doesn't work either since I want my categories to cluster and not to spread them out randomly.
抖动不起作用,因为我希望我的类别聚类而不是随机扩散它们。 Does anyone know how to do this?
有谁知道如何做到这一点? Below you can find some sample data and some graphs I tried, including comments.
您可以在下面找到我尝试的一些示例数据和一些图表,包括评论。 The first graph gives me something similar to what I want, but then with a boxplot instead of scatterplot.
第一个图给了我类似于我想要的东西,但后来用boxplot而不是scatterplot。 The second graph gives a scatterplot (artificially creating numbers for the second categorical variable), but then I loose the labels for my second categorical variable and it plots both times in one space.
第二个图给出了散点图(人工创建第二个分类变量的数字),但随后我松开了第二个分类变量的标签,并在两个空间中绘制了两次。
To make it even more complicated, I would also like to display a line for the mean value with all the scatterplots. 为了使它更复杂,我还希望显示所有散点图的平均值线。 Something similar to what is done in Categorical scatter plot with mean segments using ggplot2 in R .
与使用R中的ggplot2的平均段的分类散点图中所做的类似的事情。 How can I add this?
我该如何添加?
Thanks for all your help! 感谢你的帮助!
time = c(rep('t1',12),rep('t2',12))
Origin = c(rep('I1B',4),rep('I1C',4),rep('J4A',4),rep('I1B',4),rep('I1C',4),rep('J4A',4))
LB_FR = runif(24)
df = data.frame(time,Origin,LB_FR)
#does not work with geom_point
ggplot(df, aes(x = time, y = LB_FR, fill = Origin)) + geom_boxplot() + ggtitle('LB_FR')
#create df_2 with numbers instead of categories for Origin
df_2 = df
for (r in 1:nrow(df)){
if (df$Origin[r] == 'I1B') df_2[r,'OriginNr'] = 1
if (df$Origin[r] == 'I1C') df_2[r,'OriginNr'] = 2
if (df$Origin[r] == 'J4A') df_2[r,'OriginNr'] = 3
}
# indices for time
t1 = df_2$time=="t1"
t2 = df_2$time=="t2"
plot(df_2$OriginNr,df$LB_FR,
xlim = c(0,4), ylim = c(0,1), bty = 'n',
main = 'LB_FR', ylab = 'Fraction remaining', xlab = 'Origin', type = 'n')
points(df_2$OriginNr[t1],df_2$LB_FR[t1],col='red')
points(df_2$OriginNr[t2],df_2$LB_FR[t2],col='blue')
legend(0.1,0.9,legend=c('month 0-6','month 6-12'),pch=1,col=c('red','blue'),bty='n',cex=1.2)
The default "position" for geom_boxplot
is a dodged position. geom_boxplot
的默认“位置”是一个躲闪的位置。 You can emulate this with geom_point
as well: 您也可以使用
geom_point
来模拟它:
ggplot(df, aes(x = time, y = LB_FR, color = Origin)) +
geom_point(position = position_dodge(width = 0.4))
I would recommend keeping your questions focused: instead of "making your question even more complicated", ask a new question for the mean-line thing. 我建议你把问题集中在一起:而不是“让你的问题变得更加复杂”,为平均线问题提出一个新问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.