K均值聚类多维

Question

我正在研究K均值聚类算法。 有很多可用的示例，但是我找不到一个可以解释我要做什么的示例。

我的数据集包含客户及其购买的商品。 数据集中的每1表示客户购买了该商品。 0表示未购买。

0,0,0,0,1,0,1,0,0,1
1,0,0,1,0,0,0,1,1,0
1,1,0,0,0,0,0,0,0,0
0,0,0,0,1,1,1,0,0,0

从左到右代表不同的项目。 从上到下代表客户。 我想聚集客户。 因此，数据集中有4个维度，将有10个点。

现在，我正在尝试从该数据集中创建点，以进行下一步。 我想创建一个包含所有点对象的列表，然后将它们分配给正确的群集，但是当我创建点对象时，我不知道如何处理4个不同的维度。

class Point
{
    public int ClusterNumber { get; set; }
    public int X { get; set; }
    public int Y { get; set; }

    public Point(int clusterNumber, int CustomerId, int ProductId)
    {
        ClusterNumber = clusterNumber;
        X = CustomerId;
        Y = ProductId;
    }
}

Answer 1

这个特定的k均值问题的一个重点是客户购买的产品集。 您有四个客户及其购买物品清单，因此可能是这样的：

public class CustomerPoint
{
    public int CustomerId { get; set; }
    public ISet<int> ProductIds { get; set; }
}

然后，聚类点将是这样的一些抽象点（不同于abstract c＃关键字）：

public class ClusterPoint
{
    public int ClusterNumber { get; set; }
    public IDictionary<int, float> ProductWeights { get; set; }
}

ProductWeights将是一个字典，它将CustomerId映射到0到1（均包括）之间的值，表示该产品是否已购买。 ClusterPoint和CustomerPoint之间的距离将是产品重量与客户是否购买商品之间的差。 将计算所有产品的“距离”，这些距离的总和将导致您必须最小化总距离。 当您有两个聚类点CLP(0.4, 0.1, 0.8, 0.5)和CLP(0.2, 0.7, 0.9, 0.9)并且有客户CUP(0, 1, 1, 0) ，差异如下：

CLP1:
    |0 - 0.4|² = 0.16
    |1 - 0.1|² = 0.81
    |1 - 0.8|² = 0.04
    |0 - 0.5|² = 0.25
               ------
                 1.26
CLP2:
    |0 - 0.2|² = 0.04
    |1 - 0.7|² = 0.09
    |1 - 0.9|² = 0.01
    |0 - 0.9|² = 0.81
               ------
                 0.95

因此，客户与第二个群集点“更近”，因此将其分配给该第二个群集点。

也许您也可以将CustomerPoint.ProductIds属性更改为IDictionary<int, float>值，并使用值1和0是否“购买了商品”。 但这就是实现细节。

K均值聚类多维

问题描述

1 个解决方案

解决方案1
1 2018-05-09 20:25:19

K均值聚类多维

问题描述

1 个解决方案

解决方案1 1 2018-05-09 20:25:19

解决方案1
1 2018-05-09 20:25:19