简体   繁体   中英

Best approach to use numpy 1d arrays for linear algebra

I am having a lot of "small" issues when it comes to using numpy for linear algebra manipulations, due to the way numpy treats "vectors", or 1d arrays, giving inconsistent behaviour in my eyes.

My question is if I am making some glaring mistake in how I use numpy arrays for linear algebra, or if this is just how it works and there are no other obvious way to do this?

For example, lets say I want to perform univariate OLS on two vectors.

import numpy as np
from numpy import linalg as la

y = np.arange(10)
x = np.arange(10)
print(x.shape)

ols = la.inv(x.T@x)@(x.T@y)

LinAlgError: 0-dimensional array given. Array must be at least two-dimensional

So one solution is to force the arrays to have that extra dimension:

import numpy as np
from numpy import linalg as la

y = np.arange(10).reshape(-1, 1)
x = np.arange(10).reshape(-1, 1)

ols = la.inv(x.T@x)@(x.T@y)
print(ols)
>>> [[1.]]

Then one could think that problem is solved. But not exactly, If I have more dimensions on the X-axis. calculating t-values becomes a problem.

y = np.arange(10).reshape(-1, 1)
X = np.arange(20).reshape(10, 2)
b_hat = la.inv((X.T@X))@(X.T@y)

# Calculate standard errors
residual = y - X@b_hat
sigma_hat = residual.T@residual/(y.size - b_hat.size)
b_var = sigma_hat*la.inv(X.T@X)
b_std = np.sqrt(b_var.diagonal())  # The diagonal method returns 1d array.

# Calculate t-values
t_values = b_hat/b_std
print(t_values)

>>> [[2.47854011e+13 2.67712930e+13]
 [1.40888694e+00 1.52177182e+00]]

Which of course was not intended. Why does this happen? It's because np.sqrt(b_var.diagonal()) returns a (2,) shape for b_std. So when I divide b_hat/b_std numpy checks if they are of the same shape, which they are not (b_hat has (2, 1) shape), and numpy does not make a "true" division, but some other type of division.

The solution for this is of course to again use .reshape(-1, 1) , but I am going to have increasingly complex calculations, so it's cumbersome to always check if a vector is returned as a 1d or 2d array, and then reshape it. Which is also prone to errors, if I reshape a matrix accidentaly into a vector.

So again, am I making some glaring mistake in how I use numpy arrays for linear algebra, or if this is just how it works and there are no other obvious way to do this?

Perhaps remembering these rules might help provide the caution you are looking for, while using numpy (the terms axes and dimensions mean the same here):

  1. In Mathematics, when we write down a series of numbers [1 2 3 4] , the semantics that we choose to associate with that notation vary a bit loosely with context. There are times when we consider it as a single-axis array (which is the correct semantics), but there are times when we treat it as "1 row, 4 columns". How else would you justify mathematicians' claim that a column vector, when transposed, gives a row vector, and vice-versa? The term "Transpose" means interchange of rows and columns, which itself implies that are two axes and not just one. In the case of numpy , the semantics for the same thing would consistently and strictly be "a single axis of length 4" and not "first axis of length 1 and second axis of length 4".
  2. In numpy , as in the case of Mathematics, the idea of transpose makes sense only if you have at least two axes. As noted above, in Mathematics, we do not have consistent notation that distinguishes a single-axis array from a two-axes array, and so this rule is really moot. In numpy performing arr.T simply returns arr , if arr happens to be a single-axis array.
  3. In numpy , we can add an extra axis of unit length at any position we choose. One notation for this is arr.reshape(n1,n2,...1,...,nk) (that is by inserting a 1 in the midst of those existing comma-separated axis-lengths). Another way is by using the indexing notation arr[:,:,...,None,...,:] (that is, by having as many comma-separated colons as there are axes, and inserting a None amongst them). In the place of None , np.newaxis can also be used, but it's a bit more verbose.
  4. Based on the above, we might expect the numpy matrix multiplication operator @ to throw an error in arr @ arr.T if arr were to have single axis (eg, shape (3,) ). (How could matrix multiplication be defined for single-axis arrays?) Instead, the expression returns the sum-product of the elements of arr and arr.T , and returns it as a scalar (doesn't even return it as a single-element array).
  5. In numpy , arithmetic and comparison operators, when used between two arrays of the same shape, will get applied "element-wise" (which means between each pair of corresponding elements belonging to the two arrays). This would result in a new array, whose shape is the same as that of the operand arrays.
  6. With arithmetic and comparison operators, if the two operands are arrays whose shapes are different but broadcastable to a common shape, again, the operator gets applied element-wise after the broadcasting, and the result would again be an array with broadcast-generated shape.
  7. With arithmetic and comparison operators, if one of the operands is a scalar, the scalar will be treated as an array of shape (1,) , and the previous (broadcasting) rule will then be applied.
  8. While these points 5,6,7 above actually add to the expressive power of numpy , they frequently surprise new learners. For example, 1.0 / arr where arr is [1 2 3 4] will produce a new array consisting of values [1.0/1 1.0/2 1.0/3 1.0/4] . (I think this was one of the surprises you had faced when you had performed a division)
  9. If arr has a shape of (3,4,1,5,2,1,1) , then arr.squeeze() will get rid of the unit-length axes, thus returning an array of shape (3,4,5,2)
  10. When we index a multi-dimensional array, we normally expect the result to have lower dimensionality (fewer axis, and/or smaller lengths for the same axes) or same dimensionality, as the array being indexed. In numpy , and indexing expression such as arr[my_index_arr] can produce a shape that is more complex and has higher dimensionality than the original array arr . Again, this is a powerful expressive feature that can sometimes surprise/confuse new learners. In numpy , this is called Advanced Indexing with Integer Arrays

To stress one point from the above, just be extra careful about your expectations, when your array has a single axis (of shape like (L,) ).

Even in MATLAB which is matrix oriented (and originally only had 2d matrices), I found that keeping track of dimensions was 80% of the debugging battle. In numpy there's no shortcut to keeping a close eye on dimensions. Don't assume.

In your first case, the arrays are 1d:

In [305]: x
Out[305]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [306]: x.T                   # same, only one axis to 'switch'
Out[306]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The matrix product for 1d arrays is the dot/inner product, and the result is a scalar

In [307]: x.T@x
Out[307]: 285

@/matmul does not like to work with scalars. np.dot is ok with them, but matmul was created for working with 'batches' of arrays:

In [308]: (x@x)@(x@y)
Traceback (most recent call last):
  File "<ipython-input-308-2d486b202d47>", line 1, in <module>
    (x@x)@(x@y)
TypeError: unsupported operand type(s) for @: 'numpy.int64' and 'numpy.int64'

Then you create 2d arrays, (10,1) shape

In [309]: y = y[:,None]
In [310]: y
Out[310]: 
array([[0],
       [1],
       ...
       [9]])
In [311]: y.T             # (1,10) shape
Out[311]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Now the matmul (1,10) with (10,1) sums on the 10s to produce a (1,1) array:

In [312]: y.T@y
Out[312]: array([[285]])
In [313]: _.shape
Out[313]: (1, 1)

Two (1,1) array work with @ to again produce (1,1), and inv is ok with that.

With the X

In [314]: X.shape
Out[314]: (10, 2)
In [315]: X.T@X         #  (2,10) with (10,2) producing (2,2)
Out[315]: 
array([[1140, 1230],
       [1230, 1330]])
In [316]: X.T@y         # (2,10) with (10,1) producing (2,1)  
Out[316]: 
array([[570],
       [615]])
In [317]: (X.T@X)@(X.T@y)         # (2,2) with (2,1) producing (2,1)
Out[317]: 
array([[1406250],
       [1519050]])

The key with all these is (K,n) with (n,M) produces (K,M) with sum of products on the shared n - last of the first with 2nd to the last of second argument. That the high-school across rows and down columns method of doing matrix product.

matmul docs talk about promoting 1d arrays to 2d to perform the same operation - but the extra dimensions are removed in the result.

The next @ joins (10,2) with (2,1) to produce a (10,1) (summing on the 2s):

In [319]: X@Out[317]
Out[319]: 
array([[ 1519050],
       [ 7369650],
       [13220250],
        ...
       [54174450]])

That can subtract with (10,1) y .

residual.T@residual is (1,10) with (10,1) producint (1,1) (magnitude).

sigma_hat*la.inv(XT@X) is a (1,1) times a (2,2) resulting in (2,2) (that would have worked just as well if sigma_hat was scalar. inv(A) on a (1,1) is just 1/A .

b_var.diagonal() throws away have the values of b_var . Since it's a scalar times XT@X , we can examine:

In [323]: X.T@X
Out[323]: 
array([[1140, 1230],
       [1230, 1330]])
In [324]: (X.T@X).diagonal()
Out[324]: array([1140, 1330])

Which is the same as summing on the size 10 dimension

In [325]: (X*X).sum(0)
Out[325]: array([1140, 1330])
In [326]: np.einsum('ij,ij->j',X,X)
Out[326]: array([1140, 1330])

It's treating the 2 dimension as a 'batch':

In [328]: [X[:,i]@X[:,i] for i in range(2)]
Out[328]: [1140, 1330]

matmul is designed to work with 'batches' - for 3d (and higher) arrays:

In [329]: Xt = X.T
In [330]: Xt.shape
Out[330]: (2, 10)
In [331]: Xt[:,None,:]@Xt[:,:,None]    # (2,1,10) with (2,10,1)=>(2,1,1)
Out[331]: 
array([[[1140]],

       [[1330]]])
In [332]: (Xt[:,None,:]@Xt[:,:,None]).squeeze()
Out[332]: array([1140, 1330])

This suggests that X should have been a (2,10) or even (2,1,10) right from the start. (and y (1,10)?).

This answer is a bit long winded, but hopefully the details give you ideas of how to keep track of dimensions. It's meant to complement the general principals of the other answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM