[英]How to do classification in binary data set using scikit-learn?
I have the following binary dataset: 我有以下二进制数据集:
[
[1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0]
]
I want to cluster(scatter) this into 3 parts in such a way that the arrays with most similarity(0s and 1s at the same position in array) are clustered in same group. 我想将其群集(分散)为3个部分,以使最相似的数组(数组中相同位置的0和1s)聚集在同一组中。
Since I have learned that binary data cannot be clustered straight away and need the dimensions to be reduced. 由于我已经知道二进制数据不能立即聚类并且需要减小尺寸。 The manifold learning algorithms are capable of doing that.
多种学习算法能够做到这一点。 I am trying to reduce it to as little as 2 dimensions and then scatter it on a plot to make it more user friendly, Multi-dimensional scaling seems to be the most promising algorithm for doing this.
我试图将其减小到2维,然后将其分散在图上以使其更加用户友好,多维缩放似乎是最有前途的算法 。 But when I fit it around my dataset, it still returns the same dataset without any reductions:
但是,当我将其放在我的数据集中时,它仍然返回相同的数据集而没有任何减少:
mds = MDS(n_components=2, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=1, random_state=None, dissimilarity='euclidean')
mds.fit(X)
return X
Any idea what am I doing wrong or what am I missing? 知道我做错了什么或我想念什么吗? I am trying to reduce this data set to 2 dimensions and then cluster it on a 2d scatter graph such that the the arrays which are similar are grouped together and closely based on the similarity in positions of 0s and 1s.
我正在尝试将此数据集缩小为2维,然后将其聚类到2d散点图上,以便基于0和1s位置的相似性将相似的数组组合在一起并紧密组合。
fit()
only learns the data and not changes it in any way. fit()
仅学习数据,而不以任何方式对其进行更改。 You need to call fit_transform()
to get new transformed data. 您需要调用
fit_transform()
以获取新的转换数据。 Like this: 像这样:
newX = mds.fit_transform(X)
newX
will be the data with 2 components as you wanted. newX
将是您想要的具有2个组件的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.