简体   繁体   中英

How to apply linear regression with fixed x intercept in python?

I've found quite a few examples of fitting a linear regression with zero intercept.

However, I would like to fit a linear regression with a fixed x-intercept. In other words, the regression will start at a specific x.

I have the following code for plotting.

import numpy as np
import matplotlib.pyplot as plt

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0])


ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])


def best_fit_slope_and_intercept(xs, ys):
    # m = xs.dot(ys)/xs.dot(xs)
    m = (((np.average(xs)*np.average(ys)) - np.average(xs*ys)) /
         ((np.average(xs)*np.average(xs)) - np.average(xs*xs)))
    b = np.average(ys) - m*np.average(xs)
    return m, b


def rSquaredValue(ys_orig, ys_line):
    def sqrdError(ys_orig, ys_line):
        return np.sum((ys_line - ys_orig) * (ys_line - ys_orig))
    yMeanLine = np.average(ys_orig)
    sqrtErrorRegr = sqrdError(ys_orig, ys_line)
    sqrtErrorYMean = sqrdError(ys_orig, yMeanLine)
    return 1 - (sqrtErrorRegr/sqrtErrorYMean)


m, b = best_fit_slope_and_intercept(xs, ys)
regression_line = m*xs+b

r_squared = rSquaredValue(ys, regression_line)
print(r_squared)

plt.plot(xs, ys, 'bo')
# Normal best fit
plt.plot(xs, m*xs+b, 'r-')
# Zero intercept
plt.plot(xs, m*xs, 'g-')
plt.show()

And I want something like the follwing where the regression line starts at (5, 0). 在此处输入图片说明

Thank You. Any and all help is appreciated.

I been thinking for some time and I've found a possible workaround to the problem.

If I understood well, you want to find slope and intercept of the linear regression model with a fixed x-axis intercept.

Providing that's the case (imagine you want the x-axis intercept to take the value forced_intercept ), it's as if you "moved" all the points - forced_intercept times in the x-axis, and then you forced scikit-learn to use y-axis intercept equal 0. You would then have the slope. To find the intercept just isolate b from y=ax+b and force the point ( forced_intercept ,0). When you do that, you get to b=-a* forced_intercept (where a is the slope). In code (notice xs reshaping):

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0]).reshape((-1,1)) #notice you must reshape your array or you will get a ValueError error from NumPy.


ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])

forced_intercept = 5 #as you provided in your example of (5,0)

new_xs = xs - forced_intercept #here we "move" all the points
model = LinearRegression(fit_intercept=False).fit(new_xs, ys) #force an intercept of 0
r = model.score(new_xs,ys)
a = model.coef_

b = -1 * a * forced_intercept #here we find the slope so that the line contains (forced intercept,0)

print(r,a,b)
plt.plot(xs,ys,'o')
plt.plot(xs,a*xs+b)
plt.show()

Hope this is what you were looking for.

May be this approach will be useful.

import numpy as np
import matplotlib.pyplot as plt

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0])

ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])

# At first we add this anchor point to the points set.
xs = np.append(xs, [5.])
ys = np.append(ys, [0.])

# Then we prepare the coefficient matrix according docs
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html
A = np.vstack([xs, np.ones(len(xs))]).T

# Then we prepare weights for these points. And we put all weights
# equal except the last one (for added anchor point).
# In this example it's weight 1000 times larger in comparison with others.
W = np.diag(np.ones([len(xs)]))
W[-1,-1] = 1000.

# And we find least-squares solution.
m, c = np.linalg.lstsq(np.dot(W, A), np.dot(W, ys), rcond=None)[0]

plt.plot(xs, ys, 'o', label='Original data', markersize=10)
plt.plot(xs, m * xs + c, 'r', label='Fitted line')
plt.show()

在此处输入图片说明

如果您将 scikit-learn 用于线性回归任务,则可以使用intercept_属性定义截距。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM