如何為Sagemaker編寫Tensorflow KMeans Estimator腳本

Question

我正嘗試在SageMaker中使用Tensorflows tf.contrib.factorization.KMeansClustering估計器，但遇到了一些麻煩。 我的SageMaker Forecastor.predict predictor.predict()調用的輸出看起來不正確。 群集值太大，因為它們應該是0到7之間的整數。 （我將群集數設置為8）。

我每次運行都會得到類似的輸出（數組的后半部分是4L或其他一些數字，例如0L ）。 數組中有40個值，因為那是多少行（我將他們及其用戶的等級傳遞給predict()函數）

示例： {'outputs': {u'output': {'int64_val': [6L, 0L, 6L, 1L, 2L, 4L, 5L, 7L, 7L, 7L, 7L, 5L, 0L, 1L, 7L, 3L, 3L, 6L, 7L, 3L, 7L, 2L, 6L, 2L, 3L, 7L, 6L, 3L, 3L, 6L, 1L, 2L, 1L, 3L, 7L, 7L, 7L, 3L, 5L, 7L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L], 'dtype': 9, 'tensor_shape': {'dim': [{'size': 100L}]}}}, 'model_spec': {'signature_name': u'serving_default', 'version': {'value': 1534392971L}, 'name': u'generic_model'}}

我正在使用的數據是項目評分的稀疏矩陣，其中rows=users ， cols=items ，並且單元格包含介於0.0和10之間的浮點數。因此，我的輸入數據是一個矩陣，而不是典型的要素數組。

我認為問題可能出在serve_input_fn函數中。 這是我的SageMaker entry_point腳本：

def estimator_fn(run_config, params):
    #feature_columns = [tf.feature_column.numeric_column('inputs', shape=list(params['input_shape']))]
    return tf.contrib.factorization.KMeansClustering(num_clusters=NUM_CLUSTERS,
                            distance_metric=tf.contrib.factorization.KMeansClustering.COSINE_DISTANCE,
                            use_mini_batch=False,
                            feature_columns=None,
                            config=run_config)

def serving_input_fn(params):
    tensor = tf.placeholder(tf.float32, shape=[None, None])
    return tf.estimator.export.build_raw_serving_input_receiver_fn({'inputs': tensor})()

def train_input_fn(training_dir, params):
    """ Returns input function that would feed the model during training """
    return generate_input_fn(training_dir, 'train.csv')


def eval_input_fn(training_dir, params):
    """ Returns input function that would feed the model during evaluation """
    return generate_input_fn(training_dir, 'test.csv')


def generate_input_fn(training_dir, training_filename):
    """ Generate all the input data needed to train and evaluate the model. """
    # Load train/test data from s3 bucket
    train = np.loadtxt(os.path.join(training_dir, training_filename), delimiter=",")
    return tf.estimator.inputs.numpy_input_fn(
        x={'inputs': np.array(train, dtype=np.float32)},
        y=None,
        num_epochs=1,
        shuffle=False)()

在generate_input_fn() ， train是numpy評分矩陣。

如果有幫助，這是我對predict()函數的調用（ ratings_matrix是一個40 x num_items numpy數組）：

mtx = tf.make_tensor_proto(values=ratings_matrix,
                           shape=list(ratings_matrix.shape), dtype=tf.float32)
result = predictor.predict(mtx)

我覺得問題很簡單，我很想念。 這是我編寫的第一個ML算法，因此可以提供任何幫助。

Answer 1

感謝javadba的回答！

我對機器學習或TensorFlow的看法不是很好，所以請糾正我。 但是，您似乎可以與SageMaker集成，但是預測並不是您所期望的。

最終，SageMaker使用帶有train_and_evaluate的EstimatorSpec進行訓練，並使用TensorFlow Serving進行預測。 它沒有任何其他隱藏的功能，因此使用TensorFlow估計器從KMeans預測中獲得的結果將獨立於SageMaker。 但是，它可能受您如何定義serving_input_fn和output_fn的影響。

當您使用相同的設置在SageMaker生態系統之外運行相同的估算器時，您是否獲得了期望格式的預測？

SageMaker TensorFlow的經驗在這里開源，並顯示了什么是可能的，現在還沒有。 https://github.com/aws/sagemaker-tensorflow-container

Answer 2

您的問題-以及當然是輸入數據集-似乎更適合於Alternating Least Squares / Non-Negative Matrix Factorization ：這些問題已正確針對提供給定user / item矩陣作為輸入的建議。

看來Sagemaker可能沒有這一系列算法-但它們確實有`Factorization Machines https://docs.aws.amazon.com/sagemaker/latest/dg/fact-machines.html-這與用於推薦系統。

這是來自亞馬遜的博客，有關如何設置它： https : //aws.amazon.com/blogs/machine-learning/build-a-movie-recommender-with-factorization-machines-on-amazon-sagemaker/ ：其中的一些高點是：

該博客顯示了如何將SageMaker Factorization Machines與MovieLens輸入數據集一起使用：您可以找到一個模擬，表明您的user是他們的user而您的item是他們的movie ：

您將需要按照以下步驟將數據寫入protobuf文件：

然后，您將在其API上調用fit()方法，並可以在結果輸出中查看包括F1分數在內的結果。

如何為Sagemaker編寫Tensorflow KMeans Estimator腳本

問題描述

2 個解決方案

解決方案1
1 已采納 2018-08-17 23:57:46

解決方案2
0 2018-08-17 11:58:02

如何為Sagemaker編寫Tensorflow KMeans Estimator腳本

問題描述

2 個解決方案

解決方案1 1 已采納 2018-08-17 23:57:46

解決方案2 0 2018-08-17 11:58:02

解決方案1
1 已采納 2018-08-17 23:57:46

解決方案2
0 2018-08-17 11:58:02