简体   繁体   中英

Getting deprecation warning in Sklearn over 1d array

I'm having a terrible time resolving the warning problem described in this question unfortunately, following the suggested fixes here I'm not getting my problem solved.

Apparently I'm feeding a 1D array into SVM.SVC predict and I'm getting deprecation warnings. I just can't figure out what I'm doing wrong and I'm hoping someone can help me fix my code. I'm sure it is a small correction I'm missing.

I'm using Python 2.7

I start with a dataframe data_df (dimensions reduced here for clarity but code and structure are accurate):

   Price/Sales  Price/Book  Profit Margin  Operating Margin
0         2.80        6.01          29.56             11.97
1         2.43        4.98          25.56              6.20
2         1.61        3.24           4.86              5.38
3         1.52        3.04           4.86              5.38
4         3.31        4.26           6.38              3.58

I change the dataframe to a numpy array:

X = data_df.values

which gives me:

[[  2.8,    6.01,  29.56,  11.97],
 [  2.43,   4.98,  25.56,   6.2 ],
 [  1.61,   3.24,   4.86,   5.38],
 [  1.52,   3.04,   4.86,   5.38],
 [  3.31,   4.26,   6.38,   3.58]]

Then I center and normalize my data:

X = preprocessing.scale(X)

which give me:

[[ 0.67746872  1.5428404   1.39746257  1.90843628]
 [ 0.13956437  0.61025495  1.03249454 -0.10540376]
 [-1.05254797 -0.96518067 -0.85621499 -0.3915994 ]
 [-1.18338957 -1.14626523 -0.85621499 -0.3915994 ]
 [ 1.41890444 -0.04164945 -0.71752714 -1.01983373]]

My y is a series of 0's and 1's:

[0, 0, 1, 0, 1]

The actual data set is about 10,000 observations. I use the following code to select subsets for training, testing, and checking accuracy:

test_size = 500


clf = svm.SVC(kernel = "linear", C=1.0)
clf.fit(X[:-test_size],y[:-test_size])

correct_count = 0

for x in range(1, test_size+1):
    if clf.predict(X[-x])[0] == y[-x]:
        correct_count += 1

print("Accuracy: ", correct_count / test_size * 100.00)

The test set of factors I feed into clf.predict (X[-x] for x = 1 to test_size +1) throws the following warning:

C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\ut
ils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecat
ed in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.re
shape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contain
s a single sample.
  DeprecationWarning)

The code works and I do get predictions and am able to calculate accuracy but I'm still throwing the warning.

As far as I can tell from searching and the above referenced other question my data IS in the proper form. What am I missing?

Thanks in advance for your help.

You just need to do what the warning message suggests. Your variable X[-x] is 1D, but needs to be 2D. It's a single sample with multiple features, so just add .reshape(1,-1) to it and the warning clears up:

for x in range(1, test_size+1):
    if clf.predict(X[-x].reshape(1,-1))[0] == y[-x]:
        correct_count += 1

The clf.predict function is capable of predicting multiple values using multiple features. If you pass in a 1D array, it's unclear whether your intent was a single value with multiple features, or multiple values with a single feature. The warning message asks you to form the 2D array yourself to make the distinction explicit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM