[英]How to change an array to SFrame for GraphLab ItemSimilarityRecommend
我已經在python中編寫了自定義的成對相似性函數,該函數給出了特征X矩陣(包含特征行),查找並返回了輸出,作為給定相似性度量的每個項目的k最近鄰:
def print_pairwise_sim_for_graphlab(X,item_ids,metric,p,knn):
N = len(X)
SI = DI.squareform(DI.pdist(X,metric,p))
q = -1
Y = np.zeros((N*knn,4))
for i in range(0, N):
for k in range(1, knn+1):
q = q + 1
Y[q,0] = item_ids[i]
Y[q,1] = item_ids[np.argsort(SI[i,:])[-k]]
Y[q,2] = np.sort(SI[i,:])[-k]
Y[q,3] = k
return (Y)
我這樣稱呼它:
nn_SCD_min = print_pairwise_sim_for_graphlab(LL_features_SCD_min_np,item_ids,'minkowski',p,knn)
哪里
LL_features_SCD_min_np
array(
[[-200, -48, -127, ..., 1, 0, 1],
[-199, -38, -127, ..., 0, 0, 1],
[-202, -60, -127, ..., 1, 0, 1],
...,
[-202, -60, -127, ..., 1, 0, 1],
[-198, 56, -120, ..., 1, 0, 1],
[-202, -85, -127, ..., 1, 0, 1]])
輸出如下所示
nn_SCD_min =
array([[ 8.90000000e+01, 4.71460000e+04, 1.85300000e+03,
1.00000000e+00],
[ 8.90000000e+01, 8.11470000e+04, 1.84600000e+03,
2.00000000e+00],
[ 8.90000000e+01, 2.20700000e+03, 1.84600000e+03,
3.00000000e+00],
...,
[ 8.24630000e+04, 1.00000000e+03, 1.39300000e+03,
8.00000000e+00],
[ 8.24630000e+04, 5.98930000e+04, 1.39200000e+03,
9.00000000e+00],
[ 8.24630000e+04, 1.48900000e+03, 1.35000000e+03,
1.00000000e+01]])
在Graphlab中,我想將輸出用作graphlab.recommender.item_similarity_recommender.create
的輸入。
我使用它如下:
m2 = gl.item_similarity_recommender.create(ratings_5K, nearest_items=nn_SCD_min)
我收到以下錯誤:
87 _get_metric_tracker().track(metric_name, value=1, properties=track_props, send_sys_info=False)
88
---> 89 raise ToolkitError(str(message))
ToolkitError: Option 'nearest_items' not recognized
我認為出錯的主要原因是我的nn_SCD_min
需要作為SFrame導入(這里看起來像一個數組)。 nn_SCD_min
具有四列。 我相信這些列應具有以下標題的標題:
item_id, similar, score, rank
如何將具有以上四個頭的數組'nn_SCD_min'更改為SFrame
? 關於我這樣做的任何想法,我們將不勝感激。
您可以直接從numpy數組創建SFrame。 它將具有數組類型的單列。 然后,您可以將其unpack
為四列SFrame。
>>> nearest_items = gl.SFrame(nn_SCD_min)
>>> nearest_items = nearest_items.unpack('X1', '')\
.rename({'0': 'item_id',
'1': 'similar',
'2': 'score',
'3': 'rank'})
>>> nearest_items
Columns:
item_id float
similar float
score float
rank float
Rows: 6
Data:
+---------+---------+--------+------+
| item_id | similar | score | rank |
+---------+---------+--------+------+
| 89.0 | 47146.0 | 1853.0 | 1.0 |
| 89.0 | 81147.0 | 1846.0 | 2.0 |
| 89.0 | 2207.0 | 1846.0 | 3.0 |
| 82463.0 | 1000.0 | 1393.0 | 8.0 |
| 82463.0 | 59893.0 | 1392.0 | 9.0 |
| 82463.0 | 1489.0 | 1350.0 | 10.0 |
+---------+---------+--------+------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.