简体   繁体   English

如何为GraphLab Item将数组更改为SFrameSimilarityRecommend

[英]How to change an array to SFrame for GraphLab ItemSimilarityRecommend

I have written my custom pairwise similarity function in python which given a matrix of features X (contains rows of features), find and returns the output as k nearest neighbor to each item given a similarity metric: 我已经在python中编写了自定义的成对相似性函数,该函数给出了特征X矩阵(包含特征行),查找并返回了输出,作为给定相似性度量的每个项目的k最近邻:

def print_pairwise_sim_for_graphlab(X,item_ids,metric,p,knn):
N = len(X) 
SI = DI.squareform(DI.pdist(X,metric,p))
q = -1 
Y = np.zeros((N*knn,4))
for i in range(0, N):
    for k in range(1, knn+1):
        q = q + 1 
        Y[q,0] = item_ids[i]
        Y[q,1] = item_ids[np.argsort(SI[i,:])[-k]] 
        Y[q,2] = np.sort(SI[i,:])[-k]
        Y[q,3] = k

return (Y)

I call it like this: 我这样称呼它:

  nn_SCD_min = print_pairwise_sim_for_graphlab(LL_features_SCD_min_np,item_ids,'minkowski',p,knn)

where 哪里

 LL_features_SCD_min_np 

  array(
   [[-200,  -48, -127, ...,    1,    0,    1],
   [-199,  -38, -127, ...,    0,    0,    1],
   [-202,  -60, -127, ...,    1,    0,    1],
   ..., 
   [-202,  -60, -127, ...,    1,    0,    1],
   [-198,   56, -120, ...,    1,    0,    1],
   [-202,  -85, -127, ...,    1,    0,    1]])

The output looks like this following 输出如下所示

  nn_SCD_min = 
  array([[  8.90000000e+01,   4.71460000e+04,   1.85300000e+03,
      1.00000000e+00],
   [  8.90000000e+01,   8.11470000e+04,   1.84600000e+03,
      2.00000000e+00],
   [  8.90000000e+01,   2.20700000e+03,   1.84600000e+03,
      3.00000000e+00],
   ..., 
   [  8.24630000e+04,   1.00000000e+03,   1.39300000e+03,
      8.00000000e+00],
   [  8.24630000e+04,   5.98930000e+04,   1.39200000e+03,
      9.00000000e+00],
   [  8.24630000e+04,   1.48900000e+03,   1.35000000e+03,
      1.00000000e+01]])

In Graphlab, I want to use the output as the input for graphlab.recommender.item_similarity_recommender.create . 在Graphlab中,我想将输出用作graphlab.recommender.item_similarity_recommender.create的输入。

I use it as following: 我使用它如下:

 m2 = gl.item_similarity_recommender.create(ratings_5K, nearest_items=nn_SCD_min)

and I get the following error: 我收到以下错误:

   87         _get_metric_tracker().track(metric_name, value=1, properties=track_props, send_sys_info=False)
   88 
---> 89         raise ToolkitError(str(message))

  ToolkitError: Option 'nearest_items' not recognized

I think the main reason for error is that my nn_SCD_min needs to be imported as SFrame (it looks like an array here). 我认为出错的主要原因是我的nn_SCD_min需要作为SFrame导入(这里看起来像一个数组)。 nn_SCD_min has FOUR columns. nn_SCD_min具有四列。 I believe the columns should have headers as following headers: 我相信这些列应具有以下标题的标题:

    item_id, similar, score, rank

How can I change the array 'nn_SCD_min' to an SFrame with the above four headers? 如何将具有以上四个头的数组'nn_SCD_min'更改为SFrame Any idea about my procure to do this is greatly appreciated. 关于我这样做的任何想法,我们将不胜感激。

You can create an SFrame directly from a numpy array. 您可以直接从numpy数组创建SFrame。 It will have a single-column of array type. 它将具有数组类型的单列。 Then you can unpack that into a four-column SFrame. 然后,您可以将其unpack为四列SFrame。

>>> nearest_items = gl.SFrame(nn_SCD_min)
>>> nearest_items = nearest_items.unpack('X1', '')\
                                 .rename({'0': 'item_id', 
                                          '1': 'similar', 
                                          '2': 'score', 
                                          '3': 'rank'})

>>> nearest_items
Columns:
    item_id float
    similar float
    score   float
    rank    float

Rows: 6

Data:
+---------+---------+--------+------+
| item_id | similar | score  | rank |
+---------+---------+--------+------+
|   89.0  | 47146.0 | 1853.0 | 1.0  |
|   89.0  | 81147.0 | 1846.0 | 2.0  |
|   89.0  |  2207.0 | 1846.0 | 3.0  |
| 82463.0 |  1000.0 | 1393.0 | 8.0  |
| 82463.0 | 59893.0 | 1392.0 | 9.0  |
| 82463.0 |  1489.0 | 1350.0 | 10.0 |
+---------+---------+--------+------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM