简体   繁体   English

使用许多多项式的梯度下降不会收敛

[英]Gradient descent using many polynomials is not converging

context: I am trying to create a generic function to optimize the cost of any regression problem using polynomial regression (of any specified degree). context:我正在尝试使用多项式回归(任何指定的度数)创建一个泛型函数来优化任何回归问题的代价。 I am trying to fit my model to the load_boston dataset (with the house price as the label and 13 features). 我试图使我的模型适合load_boston数据集(房价作为标签和13个功能)。

I used multiple degrees of polynomials, and multiple learning rates and epochs (with gradient descent) and the MSE is coming out to be so high even on the training dataset (I am using 100% of the data to train the model, and I am checking the cost on the same data, but the MSE cost is still very high). 我使用多度多项式,多个学习率和时期(梯度下降)和MSE即使在训练数据集上也是如此之高(我使用100%的数据来训练模型,我是检查相同数据的成本,但MSE成本仍然很高)。

import tensorflow as tf
from sklearn.datasets import load_boston

def polynomial(x, coeffs):
    y = 0
    for i in range(len(coeffs)):
        y += coeffs[i]*x**i
    return y

def initial_parameters(dimensions, data_type, degree): # list number of dims/features and degree
    thetas = [tf.Variable(0, dtype=data_type)] # the constant theta/bias
    for i in range(degree):
        thetas.append(tf.Variable( tf.zeros([dimensions, 1], dtype=data_type)))
    return thetas

def regression_error(x, y, thetas):
    hx = thetas[0] # constant thetas - no need to have 1 for each variable (e.g x^0*th + y^0*th...)
    for i in range(1, len(thetas)):
        hx = tf.add(hx, tf.matmul( tf.pow(x, i), thetas[i]))
    return tf.reduce_mean(tf.squared_difference(hx, y))

def polynomial_regression(x, y, data_type, degree, learning_rate, epoch): #features=dimensions=variables
    thetas = initial_parameters(x.shape[1], data_type, degree)
    cost = regression_error(x, y, thetas)
    init = tf.initialize_all_variables()
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    with tf.Session() as sess:
        sess.run(init)
        for epoch in range(epoch): 
            sess.run(optimizer)
        return cost.eval()

x, y = load_boston(True) # yes just use the entire dataset
for deg in range(1, 2):
    for lr in range(-8, -5):
        error = polynomial_regression(x, y, tf.float64, deg, 10**lr, 100 )
        print (deg, lr, error)

It outputs 97.3 even though most of the labels are around 30 (degree = 1, learning rate = 10^-6). 即使大多数标签大约为30(度= 1,学习率= 10 ^ -6),它也输出97.3。 what is wrong with the code? 代码有什么问题?

The problem is that the different features are on different orders of magnitude and hence are not compatible with the learning rate which is the same for all features. 问题是不同的特征处于不同的数量级,因此与所有特征相同的学习速率不兼容。 Even more, when using a non-zero variable initialization, one has to make sure that these initial values are as well compatible with the feature values. 更重要的是,当使用非零变量初始化时,必须确保这些初始值与特征值兼容。

In [1]: from sklearn.datasets import load_boston

In [2]: x, y = load_boston(True)

In [3]: x.std(axis=0)
Out[3]: 
array([8.58828355e+00, 2.32993957e+01, 6.85357058e+00, 2.53742935e-01,
       1.15763115e-01, 7.01922514e-01, 2.81210326e+01, 2.10362836e+00,
       8.69865112e+00, 1.68370495e+02, 2.16280519e+00, 9.12046075e+01,
       7.13400164e+00])

In [4]: x.mean(axis=0)
Out[4]: 
array([3.59376071e+00, 1.13636364e+01, 1.11367787e+01, 6.91699605e-02,
       5.54695059e-01, 6.28463439e+00, 6.85749012e+01, 3.79504269e+00,
       9.54940711e+00, 4.08237154e+02, 1.84555336e+01, 3.56674032e+02,
       1.26530632e+01])

A common approach is to normalize the input data (eg to have zero mean and unit variance) and to choose the initial weights randomly (eg normal distribution, std.dev. = 1). 常见的方法是标准化输入数据(例如,具有零均值和单位方差)并随机选择初始权重(例如,正态分布,std.dev。= 1)。 sklearn.preprocessing offers various functionality for these cases. sklearn.preprocessing为这些情况提供了各种功能。

The polynomial_regression function then reduces to: 然后, polynomial_regression函数减少为:

pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree)),
    ('scaler', StandardScaler())
])
x = pipeline.fit_transform(x)
thetas = tf.Variable(tf.random_normal([x.shape[1], 1], dtype=data_type))
cost = tf.reduce_mean(tf.squared_difference(tf.matmul(x, thetas), y))

# Perform variable initialization and optimizer instantiation here.
# Run optimization over epochs.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM