简体   繁体   English

如何将二维数据分成两组

[英]How to divide 2D data into two groups

I have Test data as below;我有如下Test数据;

Test
           x        y
1  4324.3329 484.6496
3  3258.4572 499.9621
4  4462.8230 562.7703
7  5173.4353 572.9492
8  4188.0244 530.8349
9  3557.5385 494.6672
10 2353.1382 517.5235
11 4944.2605 537.7489
15 3335.6628 488.4479
16 4059.0555 534.5479
17 4694.1778 531.7709
18 3213.8639 496.0062
19 4119.5348 516.3399
20 4267.7457 537.1041
22 4284.2706 503.8527
23 3019.6271 498.8519
35 2549.8743 503.5473
36 4976.5386 566.5985
37 2717.9942 513.2320
38 3545.2092 448.4752
40 3352.3206 457.7265
41 3198.0481 560.4075
42 1387.7531 395.7657
43  957.6421 296.1419
44 3168.8167 489.5333
45 2717.1015 478.6760
46 3694.8913 455.2763
47 4131.9760 519.9161
48 4366.2339 502.5977
49 4314.1003 486.7103
50 3818.1977 461.5844
52 3745.0532 467.7885

I add scatter plot as follows;我添加散点图如下;

gg <- ggplot(Test, aes(x = x, y = y))+
  geom_point()+
  stat_ellipse()
ggMarginal(
  gg,
  type = "boxplot",
  margins = "both",
  size = 5
)
print(gg)

在此处输入图片说明

It seems like there are two groups;好像有两组;

(1) at right-top with large number of points (1) 在有大量点的右上角

(2) at left-bottom with two points. (2) 在左下角有两个点。

In this case, how can I divide the data into two groups?在这种情况下,如何将数据分为两组?

I have tried k-mean clustering as follows;我试过 k 均值聚类如下;

#k-mean
km <- kmeans(Test,2)
library(cluster)
clusplot(Test, km$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

But, this changes xy coordinates into PC1 & PC2, which is not what I want in this case.但是,这会将 xy 坐标更改为 PC1 和 PC2,在这种情况下这不是我想要的。

在此处输入图片说明

For example,例如,

set.seed(42)
km <- kmeans(Test,2)
ggplot(Test, aes(x = x, y = y,colour = factor(km$cluster)))+
 geom_point()+  
stat_ellipse(type = "norm", linetype = 2)

gives,给,

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM