簡體   English   中英

在Python中使用sklearn.preprocessing進行數據轉換

[英]Data transformation with sklearn.preprocessing in Python

我使用Python和sklearn編寫了用於多項式回歸的代碼。 我使用了預處理和PolynomialFeatures,以便可以轉換數據。 是否可以使用預處理並轉換數據,以便進行對數回歸? 我到處都看了,什么都沒找到。 這是多項式回歸的代碼,我的問題是,如何更改此代碼為對數回歸:

import numpy as np

import pandas as pd
import math
import xlrd
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures


#Reading data from excel

data = pd.read_excel("DataSet.xls").round(2)
data_size = data.shape[0]
#print("Number of data:",data_size,"\n",data.head())

def polynomial_prediction_of_future_strength(input_data, cement, blast_fur_slug,fly_ash,
                                              water, superpl, coarse_aggr, fine_aggr, days):

    variables = prediction_accuracy(input_data)[2]
    results = prediction_accuracy(input_data)[3]
    n = results.shape[0]
    results = results.values.reshape(n,1) #reshaping the values so that variables and results have the same shape

    #transforming the data into polynomial function
    Poly_Regression = PolynomialFeatures(degree=2)
    poly_variables = Poly_Regression.fit_transform(variables)

    #accuracy of prediction(splitting the dataset on train and test)
    poly_var_train, poly_var_test, res_train, res_test = train_test_split(poly_variables, results, test_size = 0.3, random_state = 4)

    input_values = [cement, blast_fur_slug, fly_ash, water, superpl, coarse_aggr, fine_aggr, days]
    input_values = Poly_Regression.transform([input_values]) #transforming the data for prediction in polynomial function

    regression = linear_model.LinearRegression() #making the linear model
    model = regression.fit(poly_var_train, res_train) #fitting polynomial data to the model

    predicted_strength = regression.predict(input_values) #strength prediction
    predicted_strength = round(predicted_strength[0,0], 2)

    score = model.score(poly_var_test, res_test) #accuracy prediction
    score = round(score*100, 2)

    accuracy_info = "Accuracy of concrete class prediction: " + str(score) + " %\n"
    prediction_info = "Prediction of future concrete class after "+ str(days)+" days: "+ str(predicted_strength) 

    info = "\n" + accuracy_info + prediction_info

    return info

#print(polynomial_prediction_of_future_strength(data, 214.9 , 53.8, 121.9, 155.6, 9.6, 1014.3, 780.6, 7))

如果要平滑過渡,最好的方法是使用scikit-learn的樣式定義自己的估算器。 您可以在此處找到更多信息。

這是一種可能性:

from sklearn.base import BaseEstimator, TransformerMixin

class LogarithmicFeatures(BaseEstimator, TransformerMixin):

    def __init__(self):
        pass

    def fit(self, X, y=None):
        self.n_features_ = X.shape[1]
        return self

    def transform(self, X, y=None):
        if X.shape[1] != self.n_features_:
            raise ValueError("X must have {:d} columns".format(self.n_features_))
        return np.log(X)

然后您可以使用以下代碼將其插入代碼中:

lf = LogarithmicFeatures()
log_variables = lf.fit_transform(variables)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM