简体   繁体   中英

Building a predictive model with Python. Projections are always 0

I am looking at some real estate data that I found online. I setup a model in Python; all code is shown below. All the data is from the boroughs of NYC, such as zip code, lotsize, commercial, residential, and a few other metrics. I'm trying to predict a 'target' variable for potentially developing a commercial real-estate lot, based on various factors.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load data
train = pd.read_csv('C:\\Users\\Excel\\Desktop\\train.csv')
test = pd.read_csv('C:\\Users\\Excel\\Desktop\\test.csv')

df = pd.concat([train,test],axis=0) #Combined both Train and Test Data set
df.shape
pd.set_option('display.max_columns', None)


# fill in NANs.
df = df.fillna(0)

print('Data frame:', df)

# convert to numbers
df = df.select_dtypes(include=[np.number])


# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Filter the columns to remove ones we don't want to use in the training
columns = [c for c in columns if c not in ['target']]


# Store the variable we'll be predicting on.
target = 'target'
train['target'] = 0
# Generate the training set.  Set random_state to be able to replicate results.
train = df.sample(frac=0.8, random_state=1)
# Select anything not in the training set and put it in the testing set.
test = df.loc[~df.index.isin(train.index)]
# Print the shapes of both sets.
print('Training set shape:', train.shape)
print('Testing set shape:', test.shape)
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(train[columns], train[target])


# Generate our predictions for the test set.
lin_predictions = lin_model.predict(test[columns])
print('Predictions:', lin_predictions)
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, test[target])
print('Computed error:', lin_mse)

This line is throwing an error:

lin_model.fit(train[columns], train[target])

Here is the error message:

KeyError: 'target'

Basically, the 'target' field doesn't appear in here: train[target]

Even when I add the field in, the projections are ALWAYS 0!!! I must be missing something simple, but I'm not sure what.

I am following the example from here, but using a completely different data set.

https://microsoft.github.io/sql-ml-tutorials/python/rentalprediction/step/2.html

I can get the 'feature importance' of factors using this snippet of code.

# Create a new matplotlib figure
fig = plt.figure()
ax = fig.add_subplot()

viz = FeatureImportances(GradientBoostingClassifier(), ax=ax)
viz.fit(X, y)
viz.poof()

在此处输入图片说明

I'd like to add a comment but I can't yet. Why are you using a linear regression to predict what I assume is a binary variable? Use logistic instead. Also what is this line : columns = [c for c in columns if c not in ['target']] Where did ['target'] come from? Another thing, train['target'] = 0 sets this entire column = 0 even though you should be using a df.loc method instead if you want to reassign column values. This is why you're getting all predicted values to be zero because the target is your dependent variable and all values are set to 0.

If all the samples in the train set have output/target = 0, as you put in the code

train['target'] = 0

Then the algorithm is going to learn that regardless of what features you have in the model, the prediction should always be 0.

Review why you need to put that as 0. This line seems to be unnecessary. Try removing that line and run the model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM