简体   繁体   English

Python中的线性回归循环(带有3个变量)

[英]Linear regression loop in Python (with 3 variables)

I'm attempting to run a linear regression function within a loop with two independent variables and one dependent variable.我正在尝试在具有两个自变量和一个因变量的循环中运行线性回归函数。 I've created new objects consisting of 1,000 random numbers selected for each of the 74 data points.我创建了新对象,其中包含为 74 个数据点中的每一个选择的 1,000 个随机数。 I'm able to run this first segment without any issues, but am having trouble when it comes to looping the linear regression function.我能够毫无问题地运行第一段,但是在循环线性回归函数时遇到了麻烦。

from sklearn import linear_model

x = glodap_hot_merged_finalized['G2salinity']
y = glodap_hot_merged_finalized['G2talk']
z = glodap_hot_merged_finalized['G2temperature']

iterations = 1000

stdevs = np.empty((iterations,), dtype=float)
slopes = np.empty((iterations,), dtype=float)
intercepts = np.empty((iterations,), dtype=float)

nbot = len(x)
sal = x.values
alk = y.values
temp = z.values
           
sal_ens = np.random.randn(iterations, nbot) * 1e-3 + sal[np.newaxis, :]  
alk_ens = np.random.randn(iterations, nbot) * 2 + alk
temp_ens = np.random.randn(iterations, nbot) * 1e-2 + temp[np.newaxis, :]

# the shapes for sal_ens, alk_ens, and temp_ens are all (1000,74)

I've been trying to run the following loop in Python with the sal_ens, temp_ens, and alk_ens variables:我一直在尝试在 Python 中使用 sal_ens、temp_ens 和 alk_ens 变量运行以下循环:

for i in range(iterations):

    X = sal_ens[i], temp_ens[i]
    Y = alk_ens[i]
 
    regr = linear_model.LinearRegression()
    regr.fit(X, Y)

    intercept_value = sm.add_constant(X) 
    
    intercept =  intercept_value[i]
    coef = regr.coef_[i]

I keep getting an error message that says:我不断收到一条错误消息,上面写着:

ValueError: Found input variables with inconsistent numbers of samples: [2,74]

I'm trying to run 1000 regressions using the random numbers selected for each variable (sal_ens, temp_ens, and alk_ens) in order to generate output with 1000 different slopes & intercepts.我正在尝试使用为每个变量(sal_ens、temp_ens 和 alk_ens)选择的随机数运行 1000 次回归,以便生成具有 1000 个不同斜率和截距的输出。

Any input or help with this would be greatly appreciated!对此的任何输入或帮助将不胜感激!

To resolve your error, you just need to create your X array properly.要解决您的错误,您只需要正确创建您的X数组。 Currently, your code makes X a tuple of two 1-dimensional arrays, each with shape (74,) .目前,您的代码使X成为两个一维数组的元组,每个数组的形状为(74,) If you look at the documentation for the LinearRegression().fit() method, you can see that X needs to be an array-like object of shape (n_samples, n_features) .如果您查看LinearRegression().fit()方法的文档,您会发现X需要是形状(n_samples, n_features)的类似数组的对象。

So, you can replace that line with this:因此,您可以用以下代码替换该行:

X = np.hstack((sal_ens[i].reshape(-1,1), temp_ens[i].reshape(-1,1)))

The .reshape(-1,1) will convert each of the two arrays to a 2-dimensional array of shape (74,1) and then np.hstack(...) will stack them horizontally to give you your desired array of shape (74,2) . .reshape(-1,1)将两个数组中的每一个转换为形状(74,1)的二维数组,然后np.hstack(...)将它们水平堆叠,为您提供所需的数组形状(74,2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM