简体   繁体   English

在 y 轴上以对数正态尺度应用线性回归,在 x 轴上应用概率尺度

[英]Apply linear regression in a Log-Normal scale in the y axis and a Prob scale in the x axis

I am trying to calculate linear regression coefficients, but I keep getting errors related to tuple .我正在尝试计算线性回归系数,但我不断收到与tuple相关的错误。

I want to plot a log normal linear regression distribution with Python and calculate the intercept b0 & slope b1 with the following data, and then calculate the y value for x=50 and x=84.1 .我想用 Python plot 对数正态线性回归分布,并用以下数据计算截距b0和斜率b1 ,然后计算x=50x=84.1y值。

The x axis should be in prob scale and the y axis in log normal scale. x axis应为概率比例, y axis应为对数正常比例。 I am not sure if the method I wrote is correct for implementing linear regression on a log normal-prob scale and calculating the coefficients.我不确定我写的方法对于在对数正态概率尺度上实现线性回归和计算系数是否正确。

The code I am using is:我正在使用的代码是:

from matplotlib import pyplot as plt
import seaborn
import probscale
from pylab import *
import numpy as np

# Permeability values (mD)
y = [283, 650, 565, 407, 714, 500, 730, 900, 420, 591, 381, 430, 324, 440, 1212, 315, 450]

# Permeability values in descending order (y, mD)
y.sort(reverse = True)
print('Permeability values in Descending Order :', y)

# Percentage of samples with larger permeability (x, %)
x = tuple([round(n/len(y)*100, 1) for n in range(len(y))])
print('Percentage of samples with larger permeability :', x)

# Plot
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_xlim(0.01, 99)
ax.set_xscale('prob')
ax.set_ylim(1e0, 1e4)
ax.set_yscale('log')
seaborn.despine(fig=fig)
plt.plot(x, y, 'go')
plt.title('Permeability Variation')
plt.ylabel('Permebility, md')
plt.xlabel('Percent of Samples with Larger Permeability, %')
plt.grid(True)
plt.show()

# Mean for x and y
mean_x = np.mean(x)
mean_y = np.mean(y)

# Total number of values
m = len(x)

# Calculate b1 and b0
numer = 0
denom = 0
for i in range(m):
    numer += (x[i] - mean_x) * (y[i] - mean_y)
    denom += (x[i] - mean_x) ** 2
b1 = numer / denom
b0 = mean_y - (b1 * mean_x)

# Print coefficients
print('b1 = ', b1, 'b0 = ', b0)

# Calculate permeability at 84.1% and 50% probability (Percentiles)
# Calculate variance for permeability distribution (VDP)
k1 = b0 + b1 * 50
k2 = b0 + b1 * 84.1

# Dykstra Parsons Formula 'VDP' (k1=@50% Percentile and k2=@84.1% Percentile)
vdp = (k1 - k2) / k1
print('vdp = ', vdp)

# Calculate r^2 score (Coefficient of Correlation)
sumofsquares = 0
sumofresiduals = 0
for i in range(m):
    y_pred = b0 + b1 * x[i]
    sumofsquares += (y[i] - mean_y) ** 2
    sumofresiduals += (y[i] - y_pred) ** 2
score = 1 - (sumofresiduals / sumofsquares)
print('R^2 score = ', score)[![enter image description here][1]][1]

Ideally it would look like something like this, with linear regression straight line of best-fit.理想情况下,它看起来像这样,具有最佳拟合的线性回归直线。 ( this is just an example) [1]: https://i.stack.imgur.com/nZG7W.png (这只是一个例子)[1]: https://i.stack.imgur.com/nZG7W.png

you are effectively calculating the covariance over variance to give you the regression coefficient.您正在有效地计算方差的协方差,从而为您提供回归系数。 You can calculate all the values using a list comprehension or numpy, but yes, it's correct.您可以使用列表推导或 numpy 计算所有值,但是是的,它是正确的。

The one thing I am not sure of, and only you can answer is, is the linear relationship between x and y, or x and log(y)?我不确定,只有你能回答的一件事是,是 x 和 y 之间的线性关系,还是 x 和 log(y) 之间的线性关系?

Below I use scipy.stats.linregress() to regress it out and plot, you can see the b1 and b0 are the same, hope this answers your question:下面我使用scipy.stats.linregress()来回归它和 plot,你可以看到 b1 和 b0 是相同的,希望这能回答你的问题:

from matplotlib import pyplot as plt
import seaborn
import probscale
import numpy as np
from scipy import stats

y = [283, 650, 565, 407, 714, 500, 730, 900, 420, 591, 381, 430, 324, 440, 1212, 315, 450]
y.sort(reverse = True)

x = tuple([round(n/len(y)*100, 1) for n in range(len(y))])

slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)

fig, ax = plt.subplots(figsize=(10, 8))
ax.set_xlim(0.01, 99)
ax.set_xscale('prob')
ax.set_ylim(1e0, 1e4)
ax.set_yscale('log')
seaborn.despine(fig=fig)
plt.plot(x, y, 'go')
plt.plot(x,list(map(lambda i:intercept+slope*i,x)), '--k')

在此处输入图像描述

print(slope,intercept)
-7.2679081487585195 889.7839128827538

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM