简体   繁体   中英

lack of knowledge what do dimesions really represent

2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns

aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns

array([[ 0.6999521 , -0.17597954],
       [ 1.70622947, -0.85919459],
       [-0.90019284,  0.80774052],
       [-1.42953238,  0.19727917],
       [-0.03416532,  0.49584749],
       [-0.28981586, -0.77484498],
       [-1.31129122,  0.423833  ],
       [-0.43920016, -1.93541758],
       [-0.06667634,  2.09925218],
       [ 1.24633485, -0.04153847]])

why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this

plt.scatter(x[:,0], x[:,1])  # this also means dimensions or columns?

x[:,0], x[:,1] why not do x[0,:], x[:,1}

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0] . Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106,  0.09882999],
       [-1.5605514 , -1.38614473],
       [ 1.23312852,  0.86838848],
       [ 1.2603898 ,  2.19895989],
       [-1.66937976,  0.79666952],
       [-0.15596669,  1.47848784],
       [ 1.74964902,  0.39280584],
       [-1.0982447 ,  0.46888408],
       [ 0.84396231, -0.34809148],
       [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 ,  1.23312852,  1.2603898 , -1.66937976,
        -0.15596669,  1.74964902, -1.0982447 ,  0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473,  0.86838848,  2.19895989,  0.79666952,
         1.47848784,  0.39280584,  0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa , but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106,  0.09882999])

As for meaning of dimensions in sum/mean , why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074,  2.75948492])

It can help to keepdims , producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074,  2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
       [-2.94669614],
       [ 2.101517  ],
       [ 3.45934969],
       [-0.87271024],
       [ 1.32252115],
       [ 2.14245486],
       [-0.62936062],
       [ 0.49587083],
       [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
       [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy , but numpy is used for much more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM