[英]Linear least-squares solution for 3d inputs
说我有两个具有以下形状的数组:
y.shape
是(z, b)
。 将其描绘为z (b,)
y个向量的集合。 x.shape
是(z, b, c)
。 将其描述为z (b, c)
多元x矩阵的集合。 我的目的是找到最小二乘系数解的z
独立矢量。 即,第一个解决方案是对x[0]
y[0]
上的y[0]
进行回归,其中这些输入分别具有形状(b, )
和(b, c)
。 ( b
观测值, c
特征。)结果将是形状(z, c)
。
np.random.seed(123)
x = np.random.randn(190, 20, 3)
y = np.random.randn(190, 20) # Assumes no intercept term
# First vector of coefficients
np.linalg.lstsq(x[0], y[0])[0]
# array([-0.12823781, -0.3055392 , 0.11602805])
# Last vector of coefficients
np.linalg.lstsq(x[-1], y[-1])[0]
# array([-0.02777503, -0.20425779, 0.22874169])
NumPy的最小二乘解算器lstsq
不能对此进行操作。 (我的预期结果是形状(190, 3)
或190个向量,每个向量具有3个系数。每个(3,)
向量是n = 20的回归中的一个系数集。
是否有一种变通方法来获取包裹在一个结果数组中的系数矩阵? 我可能在想矩阵公式 :
对于1d y和2d x,这将是:
def coefs(y, x):
return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
但是我很难让它接受上面的2d y和3d x 。
最后,我很好奇为什么lstsq
在这里遇到麻烦。 关于为什么输入必须最多为2d,有一个简单的答案吗?
这是一些演示示例:
import numpy as np
import scipy.sparse as sp
from sklearn.datasets import make_regression
from time import perf_counter as pc
np.set_printoptions(edgeitems=3,infstr='inf',
linewidth=160, nanstr='nan', precision=1,
suppress=False, threshold=1000, formatter=None)
""" Create task """
Z, B, C = 4, 3, 2
Zs = []
Bs = []
for i in range(Z):
X, y, = make_regression(n_samples=B, n_features=C, random_state=i)
Zs.append(X)
Bs.append(y)
Zs = np.array(Zs)
Bs = np.array(Bs)
""" Independent looping """
print('LOOPED CALLS')
start = pc()
result = np.empty((Z, C))
for z in range(Z):
result[z] = np.linalg.lstsq(Zs[z], Bs[z])[0]
end = pc()
print('lhs-shape: ', Zs.shape)
print('lhs-dense-fill-ratio: ', np.count_nonzero(Zs) / np.product(Zs.shape))
print('used time: ', end-start)
print(result)
""" Embedding in one """
print('EMBEDDING INTO ONE CALL')
Zs_ = sp.block_diag([Zs[i] for i in range(Z)]).todense() # convenient to use scipy.sparse
# oops: there is a dense-one too:
# -> scipy.linalg.block_diag
Bs_ = Bs.flatten()
start = pc() # one could argue if transform above should be timed too!
result_ = np.linalg.lstsq(Zs_, Bs_)[0]
end = pc()
print('lhs-shape: ', Zs_.shape)
print('lhs-dense-fill-ratio: ', np.count_nonzero(Zs_) / np.product(Zs_.shape))
print('used time: ', end-start)
print(result_)
LOOPED CALLS
lhs-shape: (4, 3, 2)
lhs-dense-fill-ratio: 1.0
used time: 0.0005415275241778155
[[ 89.2 43.8]
[ 68.5 41.9]
[ 61.9 20.5]
[ 5.1 44.1]]
EMBEDDING INTO ONE CALL
lhs-shape: (12, 8)
lhs-dense-fill-ratio: 0.25
used time: 0.00015907748341232328
[ 89.2 43.8 68.5 41.9 61.9 20.5 5.1 44.1]
原始数据如下所示:
[[[ 2.2 1. ]
[-1. 1.9]
[ 0.4 1.8]]
[[-1.1 -0.5]
[-2.3 0.9]
[-0.6 1.6]]
[[ 1.6 -2.1]
[-0.1 -0.4]
[-0.8 -1.8]]
[[-0.3 -0.4]
[ 0.1 -1.9]
[ 1.8 0.4]]]
[[ 242.7 -5.4 112.9]
[ -95.7 -121.4 26.2]
[ 57.9 -12. -88.8]
[ -17.1 -81.6 28.4]]
每个解决方案看起来像:
LHS
[[ 2.2 1. ]
[-1. 1.9]
[ 0.4 1.8]]
RHS
[ 242.7 -5.4 112.9]
嵌入式问题(一个解决步骤)如下所示:
LHS
[[ 2.2 1. 0. 0. 0. 0. 0. 0. ]
[-1. 1.9 0. 0. 0. 0. 0. 0. ]
[ 0.4 1.8 0. 0. 0. 0. 0. 0. ]
[ 0. 0. -1.1 -0.5 0. 0. 0. 0. ]
[ 0. 0. -2.3 0.9 0. 0. 0. 0. ]
[ 0. 0. -0.6 1.6 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 1.6 -2.1 0. 0. ]
[ 0. 0. 0. 0. -0.1 -0.4 0. 0. ]
[ 0. 0. 0. 0. -0.8 -1.8 0. 0. ]
[ 0. 0. 0. 0. 0. 0. -0.3 -0.4]
[ 0. 0. 0. 0. 0. 0. 0.1 -1.9]
[ 0. 0. 0. 0. 0. 0. 1.8 0.4]]
RHS
[ 242.7 -5.4 112.9 -95.7 -121.4 26.2 57.9 -12. -88.8 -17.1 -81.6 28.4]
考虑到 lstsq 的假设/标准形式, 没有任何大量的零 就 可以嵌入这种独立性假设 ,这是没有办法的 !
lstsq是:
这次针对您的特定形状修剪了输出:还要测试一个稀疏求解器 :
添加代码(末尾)
print('EMBEDDING -> sparse-solver')
Zs_ = sp.csc_matrix(Zs_) # sparse!
start = pc()
result__ = sp.linalg.lsmr(Zs_, Bs_)[0]
end = pc()
print('lhs-shape: ', Zs_.shape)
print('lhs-dense-fill-ratio: ', Zs_.nnz / np.product(Zs_.shape))
print('used time: ', end-start)
print(result__)
产量
LOOPED CALLS
lhs-shape: (190, 20, 3)
lhs-dense-fill-ratio: 1.0
used time: 0.01716980329027777
[ 11.9 31.8 29.6]
...
[ 44.8 28.2 62.3]]
EMBEDDING INTO ONE CALL
lhs-shape: (3800, 570)
lhs-dense-fill-ratio: 0.00526315789474
used time: 0.6774500271820254
[ 11.9 31.8 29.6 ... 44.8 28.2 62.3]
EMBEDDING -> sparse-solver
lhs-shape: (3800, 570)
lhs-dense-fill-ratio: 0.00526315789474
used time: 0.0038423098412817547 # a bit of a surprise
[ 11.9 31.8 29.6 ... 44.8 28.2 62.3]
一般而言: 独立解决 !
在某些情况下,使用稀疏求解器方法可以更快地解决上述任务, 但是由于我们正在比较两种完全不同的算法 (直接与迭代),因此此处的分析很困难,并且其他数据的结果可能会以戏剧性的方式发生变化。
这是线性代数解决方案,其速度与@sascha的循环版本(适用于较小阵列)的水平相当。
print('Matrix formulation')
start = pc()
result = np.squeeze(np.matmul(np.linalg.inv(np.matmul(Zs.swapaxes(1,2), Zs)),
np.matmul(Zs.swapaxes(1,2), np.atleast_3d(Bs))))
end = pc()
print('used time: ', end-start)
print(result)
输出继电器:
Matrix formulation
used time: 0.00015713176480858237
[[ 89.2 43.8]
[ 68.5 41.9]
[ 61.9 20.5]
[ 5.1 44.1]]
但是,@ sascha的答案对于较大的输入很容易取胜,尤其是随着三维尺寸的增长(外生变量/特征的数量)而增加时。
Z, B, C = 400, 300, 20
Zs = []
Bs = []
for i in range(Z):
X, y, = make_regression(n_samples=B, n_features=C, random_state=i)
Zs.append(X)
Bs.append(y)
Zs = np.array(Zs)
Bs = np.array(Bs)
# --------
print('Matrix formulation')
start = pc()
result = np.squeeze(np.matmul(np.linalg.inv(np.matmul(Zs.swapaxes(1,2), Zs)),
np.matmul(Zs.swapaxes(1,2), np.atleast_3d(Bs))))
end = pc()
print('used time: ', end-start)
print(result)
# --------
print('Looped calls')
start = pc()
result = np.empty((Z, C))
for z in range(Z):
result[z] = np.linalg.lstsq(Zs[z], Bs[z])[0]
end = pc()
print('used time: ', end-start)
print(result)
输出:
Matrix formulation
used time: 0.24000779996413257
[[ 1.2e+01 1.3e-15 6.3e+01 ..., -8.9e-15 5.3e-15 -1.1e-14]
[ 5.8e+01 2.7e-14 -4.8e-15 ..., 8.5e+01 -1.5e-14 1.8e-14]
[ 1.2e+01 -1.2e-14 4.4e-16 ..., 6.0e-15 8.6e+01 6.0e+01]
...,
[ 2.9e-15 6.6e+01 1.1e-15 ..., 9.8e+01 -2.9e-14 8.4e+01]
[ 2.8e+01 6.1e+01 -1.2e-14 ..., -2.5e-14 6.3e+01 5.9e+01]
[ 7.0e+01 3.3e-16 8.4e+00 ..., 4.1e+01 -6.2e-15 5.8e+01]]
Looped calls
used time: 0.17400113389658145
[[ 1.2e+01 7.1e-15 6.3e+01 ..., -2.8e-14 1.1e-14 -4.8e-14]
[ 5.8e+01 -5.7e-14 -4.9e-14 ..., 8.5e+01 -5.3e-15 6.8e-14]
[ 1.2e+01 3.6e-14 4.5e-14 ..., -3.6e-15 8.6e+01 6.0e+01]
...,
[ 6.3e-14 6.6e+01 -1.4e-13 ..., 9.8e+01 2.8e-14 8.4e+01]
[ 2.8e+01 6.1e+01 -2.1e-14 ..., -1.4e-14 6.3e+01 5.9e+01]
[ 7.0e+01 -1.1e-13 8.4e+00 ..., 4.1e+01 -9.4e-14 5.8e+01]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.