How can I drop the index of a Pandas Series (pandas.core.series.Series) to return a numpy.ndarray?

Question

I'm trying to show a confusion matrix for predicted test data (binary text classification). But I can't get y_pred to match y_test after running model.predict() .

First, let's look at the test/true data:

y_test = (y_test > 0.5)
print(y_test)
print(type(y_test))

Output:

2       False
17       True
18       True
...
4980     True
4986    False
4990     True
pandas.core.series.Series

The missing indexes are contained in the training set.

Here's what happens when we predict based on test data:

y_pred = model.predict(data_test)
y_pred = (y_pred > 0.5)
print(y_pred)
print(type(y_pred))

Output:

[[ True]
 [ True]
 [ True]
 [False]
 ...
 [ True]
 [ True]
 [ True]]
numpy.ndarray

Test/True data:

y_test = (y_test > 0.5)
print(y_test)

Output:

2       False
17       True
18       True
...
4980     True
4986    False
4990     True

Ultimately I'm looking to build a confusion matrix, but the data isn't the same format.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

What do you recommend?

Attempts so far:

y_test_np = y_test.values

Output:

[False  True  True ... True False  True]

Closer, but it looks like I need each item to also be an array (eg [[ True] [False] [ True]] ). How can I align the arrays?

Answer 1

Just for illustration let's create some sample data.

y_test = pd.Series([True, False])
y_pred = np.array([[True], [False]])

You can convert the pandas Series y_test to a numpy array

y_test.values

and squeeze the numpy array y_pred to obtain the same shape

numpy.squeeze(y_pred)

How can I drop the index of a Pandas Series (pandas.core.series.Series) to return a numpy.ndarray?

Question

1 answers

solution1
0 2018-10-06 19:49:10

How can I drop the index of a Pandas Series (pandas.core.series.Series) to return a numpy.ndarray?

Question

1 answers

solution1 0 2018-10-06 19:49:10

solution1
0 2018-10-06 19:49:10