如何使用scikit-learn对二进制数据集进行分类？

Question

I have the following binary dataset: 我有以下二进制数据集：

[
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
    [1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0],
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0],
    [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0]
]

I want to cluster(scatter) this into 3 parts in such a way that the arrays with most similarity(0s and 1s at the same position in array) are clustered in same group. 我想将其群集（分散）为3个部分，以使最相似的数组（数组中相同位置的0和1s）聚集在同一组中。

Since I have learned that binary data cannot be clustered straight away and need the dimensions to be reduced. 由于我已经知道二进制数据不能立即聚类并且需要减小尺寸。 The manifold learning algorithms are capable of doing that. 多种学习算法能够做到这一点。 I am trying to reduce it to as little as 2 dimensions and then scatter it on a plot to make it more user friendly, Multi-dimensional scaling seems to be the most promising algorithm for doing this. 我试图将其减小到2维，然后将其分散在图上以使其更加用户友好，多维缩放似乎是最有前途的算法。 But when I fit it around my dataset, it still returns the same dataset without any reductions: 但是，当我将其放在我的数据集中时，它仍然返回相同的数据集而没有任何减少：

mds = MDS(n_components=2, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=1, random_state=None, dissimilarity='euclidean')
mds.fit(X)
return X

Any idea what am I doing wrong or what am I missing? 知道我做错了什么或我想念什么吗？ I am trying to reduce this data set to 2 dimensions and then cluster it on a 2d scatter graph such that the the arrays which are similar are grouped together and closely based on the similarity in positions of 0s and 1s. 我正在尝试将此数据集缩小为2维，然后将其聚类到2d散点图上，以便基于0和1s位置的相似性将相似的数组组合在一起并紧密组合。

Answer 1

fit() only learns the data and not changes it in any way. fit()仅学习数据，而不以任何方式对其进行更改。 You need to call fit_transform() to get new transformed data. 您需要调用fit_transform()以获取新的转换数据。 Like this: 像这样：

newX = mds.fit_transform(X)

newX will be the data with 2 components as you wanted. newX将是您想要的具有2个组件的数据。

如何使用scikit-learn对二进制数据集进行分类？

问题描述

1 个解决方案

解决方案1
0 2018-01-18 02:13:38

如何使用scikit-learn对二进制数据集进行分类？

问题描述

1 个解决方案

解决方案1 0 2018-01-18 02:13:38

解决方案1
0 2018-01-18 02:13:38