[英]Matrix multiplication with SFrame and SArray with Graphlab and/or Numpy
Given a graphlab.SArray
named coef
: 给定一个名为coef
的graphlab.SArray
:
+-------------+----------------+
| name | value |
+-------------+----------------+
| (intercept) | 87910.0724924 |
| sqft_living | 315.403440552 |
| bedrooms | -65080.2155528 |
| bathrooms | 6944.02019265 |
+-------------+----------------+
[4 rows x 2 columns]
And a graphlab.SFrame
(shown below first 10) named x
: 还有一个名为x
的graphlab.SFrame
(显示在前10个下方):
+-------------+----------+-----------+-------------+
| sqft_living | bedrooms | bathrooms | (intercept) |
+-------------+----------+-----------+-------------+
| 1430.0 | 3.0 | 1.0 | 1 |
| 2950.0 | 4.0 | 3.0 | 1 |
| 1710.0 | 3.0 | 2.0 | 1 |
| 2320.0 | 3.0 | 2.5 | 1 |
| 1090.0 | 3.0 | 1.0 | 1 |
| 2620.0 | 4.0 | 2.5 | 1 |
| 4220.0 | 4.0 | 2.25 | 1 |
| 2250.0 | 4.0 | 2.5 | 1 |
| 1260.0 | 3.0 | 1.75 | 1 |
| 2750.0 | 4.0 | 2.0 | 1 |
+-------------+----------+-----------+-------------+
[1000 rows x 4 columns]
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below? 如何处理SArray和SFrame,以便乘法将返回具有第一行的单个向量SArray,其计算如下所示? : :
87910.0724924 * 1
+ 315.403440552 * 1430.0
+ -65080.2155528 * 3.0
+ 6944.02019265 * 1.0
= 350640.36601600994
I've currently doing silly things converting SFrame / SArray into lists and then converting it into numpy arrays to do np.multiply
. 我目前正在做一些愚蠢的事情,将SFrame / SArray转换为列表,然后将其转换为numpy数组以执行np.multiply
。 Even after converting into numpy arrays, it's not giving the right matrix-vector multiplication. 即使转换为numpy数组后,也无法给出正确的矩阵向量乘法。 My current attempt: 我目前的尝试:
import numpy as np
coef # as should in SArray above.
x # as should in the SFrame above.
intercept = list(x['(intercept)'])
sqftliving = list(x['sqft_living'])
bedrooms = list(x['bedrooms'])
bathrooms = list(x['bathrooms'])
x_new = np.column_stack((intercept, sqftliving, bedrooms, bathrooms))
coef_new = np.array(list(coef['value']))
np.multiply(coef_new, x_new)
(wrong) [out]: (错误)[输出]:
[[ 87910.07249236 451026.91998949 -195240.64665846 6944.02019265]
[ 87910.07249236 930440.14962867 -260320.86221128 20832.06057795]
[ 87910.07249236 539339.88334408 -195240.64665846 13888.0403853 ]
...,
[ 87910.07249236 794816.67019127 -260320.86221128 17360.05048162]
[ 87910.07249236 728581.94767533 -260320.86221128 17360.05048162]
[ 87910.07249236 321711.50936313 -130160.43110564 5208.01514449]]
The output of my attempt is wrong too, it should return a single vector scalar values. 我的尝试的输出也是错误的,它应该返回单个矢量标量值。 There must be an easier way to do it. 必须有一种更简单的方法来做到这一点。
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below? 如何处理SArray和SFrame,以便乘法将返回具有第一行的单个向量SArray,其计算如下所示?
And with numpy
Dataframes, how should one perform the matrix-vector multiplcation? 对于numpy
帧,应该如何执行矩阵矢量乘法?
I think your best bet is to convert both the SFrame and SArray to numpy arrays and use the numpy dot
method. 我认为您最好的选择是将SFrame和SArray都转换为numpy数组,并使用numpy dot
方法。
import graphlab
sf = graphlab.SFrame({'a': [1., 2.], 'b': [3., 5.], 'c': [7., 11]})
sa = graphlab.SArray([1., 2., 3.])
X = sf.to_dataframe().values
y = sa.to_numpy()
ans = X.dot(y)
I'm using simpler data here than what you have, but this should work for you as well. 我在这里使用的数据比您拥有的要简单,但这也应该对您有用。 The only complication I can see is that you have to make sure the values in your SArray are in the same order as the columns in your SFrame (in your example they aren't ). 我能看到的唯一麻烦是,您必须确保SArray中的值与SFrame中的列的顺序相同(在您的示例中不是 )。
I think this can be done with an SFrame apply
as well, but unless you have a lot of data, the dot product route is probably simpler. 我认为这是可以做到与SFrame apply
为好,但除非你有大量的数据,点积路线可能是简单的。
To manipulate SArray and SFrame to perform linear algebra operations you need first to convert them to Numpy Array. 要操纵SArray和SFrame执行线性代数运算,您首先需要将它们转换为Numpy Array。 Make sure that you get right dimensions and order of columns. 确保获得正确的尺寸和列顺序。 (I have coef
SArray and features
SFrame which is exactly your x
) (我有coef
SArray并features
SFrame,这正是您的x
)
In [15]: coef = coef.to_numpy()
In [17]: features = features.to_numpy()
Now coef
and features
are both Numpy arrays. 现在, coef
和features
都是Numpy数组。 So now multiplying them is as easy as: 所以现在乘以它们就像:
In [23]: prod = numpy.dot(features, coef)
In [24]: print prod
[ 350640.36601601 778861.42048755 445897.34956322 641765.45839626
243403.19622833 671306.27500907 1174215.7748441 554607.00200482
302229.79626666 708836.7121845 ]
In [25]: prod.shape
Out[25]: (10,)
In Numpy multiply()
and *
perform element-wise multiplication. 在Numpy中,multiple multiply()
和*
执行逐元素乘法。 But dot()
performs matrix multiplication which is exactly what you need. 但是dot()
执行矩阵乘法,这正是您所需要的。
Besides your output 除了你的输出
[[ 87910.07249236 451026.91998949 -195240.64665846 6944.02019265]
[ 87910.07249236 930440.14962867 -260320.86221128 20832.06057795]
[ 87910.07249236 539339.88334408 -195240.64665846 13888.0403853 ]
...,
[ 87910.07249236 794816.67019127 -260320.86221128 17360.05048162]
[ 87910.07249236 728581.94767533 -260320.86221128 17360.05048162]
[ 87910.07249236 321711.50936313 -130160.43110564 5208.01514449]]
is half wrong. 错了一半。 If you now sum values in each row you will get your first element of vector: 如果现在对每一行中的值求和,将获得向量的第一个元素:
In [26]: 87910.07249236 + 451026.91998949 + (-195240.64665846) + 6944.02019265
Out[26]: 350640.3660160399
But dot()
does all this for you, so you don't need to worry. 但是dot()
会为您完成所有这些工作,因此您无需担心。
PS Are you in Machine Learning Specialization? PS您正在学习机器学习专业吗? Me too, that's why I know this :-) 我也是,这就是为什么我知道这一点:-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.