简体   繁体   English

sklearn:在 MaskedArray 上预测的分类器

[英]sklearn: Classifier to predict on a MaskedArray

I am trying to figure out how to deal with a classifier prediction on a numpy Masked array (instead of a regular numpy array).我试图弄清楚如何处理 numpy 掩码数组(而不是常规 numpy 数组)上的分类器预测。 Here is my code:这是我的代码:

# My masked array on which to perform the prediction
>>> type(patch)
    numpy.ma.core.MaskedArray
>>> patch.shape
    (3,3,14)
# This is how it looks like the first layer in the 3rd dimension. 
>>> patch[:,:,0]
    masked_array(
  data=[[90, 28, 16],
        [79, 32, --],
        [41, --, --]],
  mask=[[False, False, False],
    [False, False,  True],
    [False,  True,  True]],
 fill_value=999999,
 dtype=uint16)

In the above code you can see the first layer in the third dimension.在上面的代码中,您可以看到第三维的第一层。 There are 14 layers as you can see from patch.shape .patch.shape可以看到有 14 层。 Each of them has positions: (1,2), (2,1) and (2,2) masked!他们每个人都有位置:(1,2),(2,1)和(2,2)蒙面!

Now, I use a pre-trained RandomForest classifier cl to classify the values of the patch with ids 1,4,6.现在,我使用预训练的 RandomForest 分类器cl对 id 为 1、4、6 的补丁的值进行分类。 I would like the classifier to ignore the masked values for the classification process, but after doing:我希望分类器忽略分类过程的掩码值,但在这样做之后:

>>> class_pred = cl.predict(patch.reshape(-1, patch.shape[2]))
>>> class_pred = class_pred.reshape(patch[:,:,0].shape)

I get:我得到:

>>> class_pred 
    array([[4, 4, 4],
           [4, 4, 1],
           [4, 1, 1]])

So the positions at (1,2), (2,1) and (2,2) are not masked anymore but they were also classified.因此 (1,2)、(2,1) 和 (2,2) 处的位置不再被屏蔽,但它们也被分类了。

Is there a way to force the classifier to ignore the masked values during the classification process?有没有办法强制分类器在分类过程中忽略屏蔽值? in order to obtain something like this:为了获得这样的东西:

masked_array(
  data=[[4, 4, 4],
        [4, 4, --],
        [4, --, --]],
  mask=[[False, False, False],
    [False, False,  True],
    [False,  True,  True]],
 fill_value=999999,
 dtype=uint16)

The answer right now is I think: Scikit Learn ignores masks on data passed.我认为现在的答案是:Scikit Learn 会忽略传递数据的掩码。 Whatever the underlying value of that masked array is in the masked data, will be used by the classifier to fit and predict , therefore you get a class value.无论该掩码数组的基础值在掩码数据中如何,分类器都将使用它来fitpredict ,因此您将获得 class 值。

For your specific case: how important is that the input has a matrix structure?对于您的具体情况:输入具有矩阵结构有多重要? If these inputs are always masked (eg because they are triangular arrays) you might want to unravel them into vectors.如果这些输入总是被屏蔽(例如,因为它们是三角形数组),您可能希望将它们分解为向量。 Even for full square matrices like images, people do that (think a ConvNet for example).即使对于像图像这样的全方阵,人们也会这样做(例如考虑一个 ConvNet)。

On a broader sense, if what you are doing is representing missing values, then I must say that this kind of ML is still in an embrionary stage (but advancing at a pace).在更广泛的意义上,如果你所做的是代表缺失值,那么我必须说,这种 ML 仍处于胚胎阶段(但正在加速推进)。 I can recommend you the book "Statistical Analysis with Missing Data", which has quite a few algorithms.我可以向您推荐《缺少数据的统计分析》一书,其中包含不少算法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用sklearn套袋分类器预测连续值 - predict continuous values using sklearn bagging classifier 如何转换sklearn中任何分类器的predict()方法的输出? - How to convert the output of the predict() method of any classifier in sklearn? 使用 Python scikit sklearn 为最近邻 (knn) 分类器调用预测函数 - Call predict function for nearest neighbor (knn) classifier with Python scikit sklearn 如何正确重塑sklearn分类器的predict_proba的多类output? - How to correctly reshape the multiclass output of predict_proba of a sklearn classifier? sklearn分类器-使auc最大化的predict_proba阈值 - sklearn classifier - predict_proba threshold that maximizes auc 当使用来自 sklearn.neighbors.KNeighborsClassifier 的 predict 和 kneighbors 时,KNN 分类器给出不同的结果 - KNN classifier gives different results when using predict and kneighbors from sklearn.neighbors.KNeighborsClassifier 使用Sklearn.naive_bayes.Bernoulli的朴素贝叶斯分类器; 如何使用模型进行预测? - Naive Bayes Classifier using Sklearn.naive_bayes.Bernoulli; how to use model to predict? Classifier.predict 在 Python - Classifier.predict In Python tfidfvectorizer 在保存的分类器中预测 - tfidfvectorizer Predict in saved classifier Python分类器Sklearn - Python Classifier Sklearn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM