简体   繁体   English

如何使用scikit-learn对二进制数据集进行分类?

[英]How to do classification in binary data set using scikit-learn?

I have the following binary dataset: 我有以下二进制数据集:

[
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
    [1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0],
    [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0],
    [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0]
]

I want to cluster(scatter) this into 3 parts in such a way that the arrays with most similarity(0s and 1s at the same position in array) are clustered in same group. 我想将其群集(分散)为3个部分,以使最相似的数组(数组中相同位置的0和1s)聚集在同一组中。

Since I have learned that binary data cannot be clustered straight away and need the dimensions to be reduced. 由于我已经知道二进制数据不能立即聚类并且需要减小尺寸。 The manifold learning algorithms are capable of doing that. 多种学习算法能够做到这一点。 I am trying to reduce it to as little as 2 dimensions and then scatter it on a plot to make it more user friendly, Multi-dimensional scaling seems to be the most promising algorithm for doing this. 我试图将其减小到2维,然后将其分散在图上以使其更加用户友好,多维缩放似乎是最有前途的算法 But when I fit it around my dataset, it still returns the same dataset without any reductions: 但是,当我将其放在我的数据集中时,它仍然返回相同的数据集而没有任何减少:

mds = MDS(n_components=2, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=1, random_state=None, dissimilarity='euclidean')
mds.fit(X)
return X

Any idea what am I doing wrong or what am I missing? 知道我做错了什么或我想念什么吗? I am trying to reduce this data set to 2 dimensions and then cluster it on a 2d scatter graph such that the the arrays which are similar are grouped together and closely based on the similarity in positions of 0s and 1s. 我正在尝试将此数据集缩小为2维,然后将其聚类到2d散点图上,以便基于0和1s位置的相似性将相似的数组组合在一起并紧密组合。

fit() only learns the data and not changes it in any way. fit()仅学习数据,而不以任何方式对其进行更改。 You need to call fit_transform() to get new transformed data. 您需要调用fit_transform()以获取新的转换数据。 Like this: 像这样:

newX = mds.fit_transform(X)

newX will be the data with 2 components as you wanted. newX将是您想要的具有2个组件的数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 scikit-learn python 训练具有归一化数据集的分类算法 - How to train a classification algorithm with normalized data set using scikit-learn python 使用scikit-learn,如何在小数据集上学习SVM? - Using scikit-learn, how do I learn a SVM over a small data set? 如果不是,可以使用scikit-learn而不是二进制分类来预测变量 - can one predict variable using scikit-learn rather binary classification if yes than how 如何使用scikit-learn将数据转换为适合用于多类分类任务的格式? - How to convert data in to a format suitable to be used for a multi-class classification task, using scikit-learn? 使用 scikit-learn 训练线性回归 model 后,如何对原始数据集中不存在的新数据点进行预测? - After training the Linear Regression model using scikit-learn , How to do predictions for new data points which are not there in original data set? 怀疑使用 scikit-learn RandomForestClassifier 过度拟合二元分类玩具问题 - Suspect overfitting binary classification toy problem with scikit-learn RandomForestClassifier scikit-learn中二进制分类的权重和偏差量 - Dimension of weights and bias for binary classification in scikit-learn 使用scikit-learn的数据集不平衡且负多数 - Imbalanced data set with a negative example majority using scikit-learn scikit-learn中的简单分类 - Simple classification in scikit-learn 如何告诉scikit-了解给出F-1 /精确/召回得分的标签(二进制分类)? - How to tell scikit-learn for which label the F-1/precision/recall score is given (in binary classification)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM