简体   繁体   English

带有SFrame的矩阵乘法和带有Graphlab和/或Numpy的SArray

[英]Matrix multiplication with SFrame and SArray with Graphlab and/or Numpy

Given a graphlab.SArray named coef : 给定一个名为coefgraphlab.SArray

+-------------+----------------+
|     name    |     value      |
+-------------+----------------+
| (intercept) | 87910.0724924  |
| sqft_living | 315.403440552  |
|   bedrooms  | -65080.2155528 |
|  bathrooms  | 6944.02019265  |
+-------------+----------------+
[4 rows x 2 columns]

And a graphlab.SFrame (shown below first 10) named x : 还有一个名为xgraphlab.SFrame (显示在前10个下方):

+-------------+----------+-----------+-------------+
| sqft_living | bedrooms | bathrooms | (intercept) |
+-------------+----------+-----------+-------------+
|    1430.0   |   3.0    |    1.0    |      1      |
|    2950.0   |   4.0    |    3.0    |      1      |
|    1710.0   |   3.0    |    2.0    |      1      |
|    2320.0   |   3.0    |    2.5    |      1      |
|    1090.0   |   3.0    |    1.0    |      1      |
|    2620.0   |   4.0    |    2.5    |      1      |
|    4220.0   |   4.0    |    2.25   |      1      |
|    2250.0   |   4.0    |    2.5    |      1      |
|    1260.0   |   3.0    |    1.75   |      1      |
|    2750.0   |   4.0    |    2.0    |      1      |
+-------------+----------+-----------+-------------+
[1000 rows x 4 columns]

How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below? 如何处理SArray和SFrame,以便乘法将返回具有第一行的单个向量SArray,其计算如下所示? :

   87910.0724924   * 1 
+    315.403440552 * 1430.0 
+ -65080.2155528   * 3.0
+   6944.02019265  * 1.0 
= 350640.36601600994

I've currently doing silly things converting SFrame / SArray into lists and then converting it into numpy arrays to do np.multiply . 我目前正在做一些愚蠢的事情,将SFrame / SArray转换为列表,然后将其转换为numpy数组以执行np.multiply Even after converting into numpy arrays, it's not giving the right matrix-vector multiplication. 即使转换为numpy数组后,也无法给出正确的矩阵向量乘法。 My current attempt: 我目前的尝试:

import numpy as np
coef # as should in SArray above.
x # as should in the SFrame above.
intercept = list(x['(intercept)'])
sqftliving =  list(x['sqft_living'])
bedrooms =  list(x['bedrooms'])
bathrooms =  list(x['bathrooms'])
x_new = np.column_stack((intercept, sqftliving, bedrooms, bathrooms))

coef_new = np.array(list(coef['value']))

np.multiply(coef_new, x_new)

(wrong) [out]: (错误)[输出]:

[[  87910.07249236  451026.91998949 -195240.64665846    6944.02019265]
 [  87910.07249236  930440.14962867 -260320.86221128   20832.06057795]
 [  87910.07249236  539339.88334408 -195240.64665846   13888.0403853 ]
 ..., 
 [  87910.07249236  794816.67019127 -260320.86221128   17360.05048162]
 [  87910.07249236  728581.94767533 -260320.86221128   17360.05048162]
 [  87910.07249236  321711.50936313 -130160.43110564    5208.01514449]]

The output of my attempt is wrong too, it should return a single vector scalar values. 我的尝试的输出也是错误的,它应该返回单个矢量标量值。 There must be an easier way to do it. 必须有一种更简单的方法来做到这一点。

How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below? 如何处理SArray和SFrame,以便乘法将返回具有第一行的单个向量SArray,其计算如下所示?

And with numpy Dataframes, how should one perform the matrix-vector multiplcation? 对于numpy帧,应该如何执行矩阵矢量乘法?

I think your best bet is to convert both the SFrame and SArray to numpy arrays and use the numpy dot method. 我认为您最好的选择是将SFrame和SArray都转换为numpy数组,并使用numpy dot方法。

import graphlab

sf = graphlab.SFrame({'a': [1., 2.], 'b': [3., 5.], 'c': [7., 11]})
sa = graphlab.SArray([1., 2., 3.])

X = sf.to_dataframe().values
y = sa.to_numpy()

ans = X.dot(y)

I'm using simpler data here than what you have, but this should work for you as well. 我在这里使用的数据比您拥有的要简单,但这也应该对您有用。 The only complication I can see is that you have to make sure the values in your SArray are in the same order as the columns in your SFrame (in your example they aren't ). 我能看到的唯一麻烦是,您必须确保SArray中的值与SFrame中的列的顺序相同(在您的示例中不是 )。

I think this can be done with an SFrame apply as well, but unless you have a lot of data, the dot product route is probably simpler. 我认为这是可以做到与SFrame apply为好,但除非你有大量的数据,点积路线可能是简单的。

To manipulate SArray and SFrame to perform linear algebra operations you need first to convert them to Numpy Array. 要操纵SArray和SFrame执行线性代数运算,您首先需要将它们转换为Numpy Array。 Make sure that you get right dimensions and order of columns. 确保获得正确的尺寸和列顺序。 (I have coef SArray and features SFrame which is exactly your x ) (我有coef SArray并features SFrame,这正是您的x

In [15]: coef = coef.to_numpy()
In [17]: features = features.to_numpy()

Now coef and features are both Numpy arrays. 现在, coeffeatures都是Numpy数组。 So now multiplying them is as easy as: 所以现在乘以它们就像:

In [23]: prod = numpy.dot(features, coef)
In [24]: print prod

[  350640.36601601   778861.42048755   445897.34956322   641765.45839626
   243403.19622833   671306.27500907  1174215.7748441    554607.00200482
   302229.79626666   708836.7121845 ]

In [25]: prod.shape
Out[25]: (10,)

In Numpy multiply() and * perform element-wise multiplication. 在Numpy中,multiple multiply()*执行逐元素乘法。 But dot() performs matrix multiplication which is exactly what you need. 但是dot()执行矩阵乘法,这正是您所需要的。

Besides your output 除了你的输出

[[  87910.07249236  451026.91998949 -195240.64665846    6944.02019265]
 [  87910.07249236  930440.14962867 -260320.86221128   20832.06057795]
 [  87910.07249236  539339.88334408 -195240.64665846   13888.0403853 ]
 ..., 
 [  87910.07249236  794816.67019127 -260320.86221128   17360.05048162]
 [  87910.07249236  728581.94767533 -260320.86221128   17360.05048162]
 [  87910.07249236  321711.50936313 -130160.43110564    5208.01514449]]

is half wrong. 错了一半。 If you now sum values in each row you will get your first element of vector: 如果现在对每一行中的值求和,将获得向量的第一个元素:

In [26]: 87910.07249236 + 451026.91998949 + (-195240.64665846) + 6944.02019265
Out[26]: 350640.3660160399

But dot() does all this for you, so you don't need to worry. 但是dot()会为您完成所有这些工作,因此您无需担心。

PS Are you in Machine Learning Specialization? PS您正在学习机器学习专业吗? Me too, that's why I know this :-) 我也是,这就是为什么我知道这一点:-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM