简体   繁体   中英

How to improve accuracy_score in machine learning python for this regression problem?

I am a beginner to machine learning and as part of learning I choose student performance dataset from UCI. I want to predict the final result of a student based on the features given.

I first tried using two main and highly correlated features G1 and G2 that are grades of two exams. I used LinearRegression algorithm and got an accuracy of 0.4 or less.

Then I tried feature engineering on all the features that are objects in dataframe and still the accuracy is same.

How can I improve accuracy score?

My code as a Python notebook

from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.svm import SVR

from sklearn.metrics import mean_squared_error, mean_absolute_error, median_absolute_error,accuracy_score

df = pd.read_csv('student-mat.csv',sep=';')
df2 = pd.read_csv('student-por.csv',sep=';')

df = [df,df2]
df = pd.concat(df)
df = pd.get_dummies(df)

X = df.drop('G3',axis=1)
y = df['G3']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state=42)

model = LinearRegression()
model.fit(X_train,y_train)

y_pred = model.predict(X_test)
y_pred = [int(round(i)) for i in y_pred]

accuracy_score(y_test,y_pred)

The accuracy calculated on continous variables is not very useful. You can use the mean squared error instead, which is relevant for continuous output.

As for improving your model, you can try to use the different tools at your disposal to identify the most relevant features. I recommend statsmodels API ( https://www.statsmodels.org/stable/regression.html ) to get a more in-depth analysis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM