Scikit-Learn issues error for RandomForestClassifier for multilabel classification - Jagged arrays

Question

Scikit-Learn RandomForestClassifier throws an error for a multilabel classification problem.

This code creates a RandomForestClassifier multilabel object, given predictors C and multi-labels out with no error.

C = np.array([[2,4,6],[4,2,1],[8,3,1]])
out = np.array([[0,1],[0,1],[1,0]])
rf = RandomForestClassifier(n_estimators=100, oob_score=True)
rf.fit(C,out)

If I modify the multilabels, so that all the elements at a certain index are the same, say (where all the first components of the multilabels equals zero)

out = np.array([[0,1],[0,1],[0,0]])

I get an error and traceback:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a 
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. 
If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  y_pred = np.array(y_pred, copy=False)

raise ValueError(
    507             "The type of target cannot be used to compute OOB "
    508             f"estimates. Got {y_type} while only the following are "
    509             "supported: continuous, continuous-multioutput, binary, "
    510             "multiclass, multilabel-indicator."
    511         )
ValueError: could not broadcast input array from shape (2,1) into shape (2,)

Not requesting OOB predictions does not result in an error:

rf_err = RandomForestClassifier(n_estimators=100, oob_score=False)

I cannot figure out why keeping the OOB predictions would trigger such an error, when all the n-component of a multilabel are equal.

Answer 1

In your setup out_err = np.array([[0,1],[0,1],[0,0]]) you do not have any examples of the second class, so you only have elements of 1 class.

That means that there is no 'class label' dimension and it can be omitted. That's why you see (2,) shape.

Please, describe your initial intent: why would you need to set a particular position in labels to 0. If you try to go with N-1 classes instead of N classes I suggest removing the position itself and the elements of the class from the dataset, not putting all zeros:

out=[[1,0,0],[0,1,0],[0,1,0],[0,0,1],[1,0,0]]  # 3 classes
# remove the second class:
out=[[1,0],[0,1],[1,0]]  # 2 classes

Scikit-Learn issues error for RandomForestClassifier for multilabel classification - Jagged arrays

Question

1 answers

solution1
2 2022-12-12 10:46:26

Scikit-Learn issues error for RandomForestClassifier for multilabel classification - Jagged arrays

Question

1 answers

solution1 2 2022-12-12 10:46:26

solution1
2 2022-12-12 10:46:26