简体   繁体   English

如何获得每个分布具有不同形状和颜色的混合数据的散点图?

[英]How to get a scatter plot of mixture data with different shape and colour for each distribution?

I am running a simulation of mixture data. 我正在模拟混合数据。 My function is harder than Gaussian distribution. 我的功能比高斯分布更难。 Hence, here, I simplified my question to be in Gaussian form. 因此,在这里,我将问题简化为高斯形式。 That is, if I simulated a mixture data like this: 也就是说,如果我模拟这样的混合数据:

  N=2000
 U=runif(N, min=0,max=1)
 X = matrix(NA, nrow=N, ncol=2)
         for (i in 1:N){
              if(U[i] < 0.7){
               X[i,] <-   rnorm(1,0.5,1)
                    } else {
               X[i,] <- rnorm(1,3,5)
      }
 }

How can I have a scatter plot with different colour and shape (type of the plot point) for each cluster or distribution? 如何为每个聚类或分布创建具有不同颜色和形状(散点的类型)的散点图? I would like to have this manually since my function is hard and complex. 我想手动操作,因为我的功能既复杂又困难。 I tried plot(X[,1],X[,2],col=c("red","blue")) but it does not work. 我尝试了plot(X[,1],X[,2],col=c("red","blue"))但是它不起作用。

Here's what I got, but I'm not sure if this what you are looking for - the location of the observations for both clusters are exactly the same. 这就是我得到的,但是我不确定这是否是您要寻找的-两个聚类的观测位置完全相同。

library(tidyverse)
df <- data.frame(X = X, U = U)
df <- gather(df, key = cluster, value = X, -U)
ggplot(df, aes(x = X, y = U, colour = cluster)) + geom_point() + facet_wrap(~cluster)

在此处输入图片说明

EDIT: I don't seem to be understanding what you are looking to map onto a scatter plot, so I'll indicate how you need to shape your data in order to create a chart like the above with the proper X and Y coordinates: 编辑:我似乎不太了解您要映射到散点图上的内容,因此,我将指出您需要如何对数据进行整形,以便创建具有上述适当X和Y坐标的图表:

 head(df)
            U cluster          X
 1 0.98345408     X.1  2.3296047
 2 0.33939935     X.1 -0.6042917
 3 0.66715421     X.1 -2.2673422
 4 0.06093674     X.1  2.4007376
 5 0.48162959     X.1 -2.3118850
 6 0.50780007     X.1 -0.7307929

So you want one variable for the Y coordinate (I'm using variable U here), one variable for the X coordinate (using X here), and a 3rd variable that indicates whether the observation belongs to cluster 1 or cluster 2 (variable cluster here). 因此,您需要一个用于Y坐标的变量(我在这里使用变量U ),一个用于X坐标的变量(在这里使用X )以及一个用于指示观察值是属于聚类1还是聚类2的第三个变量(变量cluster这里)。

I think this is what you want. 我想这就是你想要的。 Note that I had to do a bit of guesswork here to figure out what was going on, because your example code seems to have an error in it, you weren't generating different x1 and x2 values in each row: 请注意,我必须在这里做一些猜测才能弄清楚发生了什么,因为示例代码似乎有错误,因此您并没有在每一行中生成不同的x1和x2值:

N=2000
U=runif(N, min=0,max=1)
X = matrix(NA, nrow = N, ncol=2)
for (i in 1:N){
    if(U[i] < 0.7){
        # You had rnorm(n=1, ...) which gives 2 identical values in each row
        # Change that to 2 and you get different X1 and X2 values
        X[i,] <-   rnorm(2, 0.5, 1)
    } else {
        X[i,] <- rnorm(2, 3, 5)
    }
}

df = data.frame(
    source = ifelse(U < 0.7, "dist1", "dist2"),
    x = X[, 1],
    y = X[, 2]
)

library(ggplot2)
ggplot(df, aes(x = x, y = y, colour = source, shape = source)) +
    geom_point()

Result: 结果:

散点图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM