简体   繁体   中英

Create a n x m array of polynomials using a (n x 1) data through Numpy/Pandas

I created an array using the following:

𝑥𝑖 = np.random.normal(0,1,50), which gave me
array([ 1.92024714, -0.19882742, -0.26836024,  0.32805879, -0.32085809,
       -0.23569939,  0.22310599,  0.5483915 , -0.13106083, -1.03798811,
        0.4586899 , -1.7378367 , -0.49868295,  1.58943447,  0.92153814,
        0.38894787, -1.26605208,  0.44308314,  1.10222734,  0.40031394,
       -1.2126154 ,  0.26871733, -0.85161259,  0.15853002, -0.18531145,
       -0.18069696,  0.19121711,  0.16586507,  0.43668293,  0.38395065,
       -1.02418998,  0.10464186, -0.02777545, -0.30571787,  1.0690931 ,
       -0.67266002,  2.00256049, -0.05156432, -1.03735733,  0.27650841,
       -0.53300549, -0.4301668 ,  1.01371008, -0.70780846,  0.11577668,
        0.19328765, -0.72971236,  1.61804424, -0.69770352, -1.33161613])

For each element of this array how can I do the following to give me a 50x3 matrix something like this – ANY SUGGESTIONS ?

𝑥1^1     𝑥1^2     𝑥1^3
𝑥2^1     𝑥2^2     𝑥2^3
𝑥3^1     𝑥3^2     𝑥3^3
.
.
𝑥50^1     𝑥50^2     𝑥50^3

ie the numbers in 50x1 array above would look this in an 50 x 3 array

1.92024714     3.68734907867818      7.08062152251341
-0.19882742    0.03953234294385     -0.00786011375408 
-0.26836024     0.07201721841285     -0.01932655801740
. 
.
.
.
.
-1.33161613     1.77320151767618     -2.36122374267808

Using np.column_stack

np.column_stack((a, a**2, a**3))

array([[ 1.92024714e+00,  3.68734908e+00,  7.08062152e+00],
       [-1.98827420e-01,  3.95323429e-02, -7.86011375e-03],
              ...      ,        ...     ,       ...
       [-6.97703520e-01,  4.86790202e-01, -3.39635237e-01],
       [-1.33161613e+00,  1.77320152e+00, -2.36122374e+00]])

Here's one way leveraging broadcasting :

a = np.random.normal(0,1,50)

out = a[:,None]**np.arange(1,4)

print(out.shape)
# (50, 3)

What you're describing here is called a Vandermonde matrix . numpy has this built in (and more performant than broadcasting on large matrices)

The first column of a Vandermonde matrix is always 1 , so you can filter that out if you wish.


a = np.random.normal(0, 1, 50)

np.vander(a, 4, increasing=True)[:, 1:]

array([[ 4.21022633e-01,  1.77260058e-01,  7.46304963e-02],   
       [-9.37208666e-02,  8.78360084e-03, -8.23206683e-04],   
                          ...   
       [-9.02260087e-01,  8.14073265e-01, -7.34505815e-01],   
       [ 1.21125200e+00,  1.46713140e+00,  1.77706584e+00]])  

Just for a bit of validation:

>>> np.isclose(np.vander(a, 4, increasing=True)[:, 1:], a[:, None]**np.arange(1, 4)).all()
True

On large matricies, vander beats broadcasting:

a = np.random.normal(0, 1, 10_000)

In [99]: %timeit np.vander(a, 100, increasing=True)[:, 1:]
8.37 ms ± 97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [100]: %timeit a[:, None]**np.arange(1, 100)
51.4 ms ± 904 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If you don't want a strictly increasing matrix, this becomes far less useful, and will calculate unnecessary powers, in which case you should fall back to the broadcasted solution.

All, many thanks for your responses. I'm a beginner at Python and it's great to see three different ways to address this problem. I read and educated myself about all three.

Thanks again !!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM