简体   繁体   中英

Invalid literal for Float error in Python

I am trying to use sklearn and perform linear regression in Python using sklearn library.

This is the code I have used to train and fit the model, I am getting the error when I run the predict function call.

train, test = train_test_split(h1, test_size = 0.5, random_state=0)

my_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']
trainInp = train[my_features]

target = ['price']
trainOut = train[target]

regr = LinearRegression()

# Train the model using the training sets

regr.fit(trainInp, trainOut)

print('Coefficients: \n', regr.coef_)

testPred = regr.predict(test)

After fitting the model, when I try to predict using the test data, it throws the following error

Traceback (most recent call last):
  File "C:/Users/gouta/PycharmProjects/MLCourse1/Python.py", line 52, in <module>
    testPred = regr.predict(test)
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\linear_model\base.py", line 200, in predict
    return self._decision_function(X)
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\linear_model\base.py", line 183, in _decision_function
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 393, in check_array
    array = array.astype(np.float64)
ValueError: invalid literal for float(): 20140604T000000

The coefficients for the Linear Regression Model are

('Coefficients: \n', array([[ -5.04902429e+04,   5.23550164e+04,   2.90631319e+02,
         -1.19010351e-01,  -1.25257545e+04,   6.52414059e+02]]))

The following is the first five lines of the test dataset

测试数据集

Is the error being caused because of the large value of coefficients? How to fix this?

Your problem is that you're fitting the model on a selected set of features from the whole dataframe (you do trainInp = train[my_features] ), but you're trying to predict on the complete set of features ( regr.predict(test) ), including non-numeric features like date .

So instead of doing regr.predict(test) , you should do regr.predict(test[my_features]) . More generally, remember that whatever preprocessing you apply to the training set (normalization, feature selection, PCA, ...), you should also apply to the test set.

Alternatively, you could cut down to the set of features of interest before you do the train-test split:

my_features = ['bedrooms', 'bathrooms', ...]
train, test = train_test_split(h1[my_features], test_size = 0.5, random_state=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM