SageMaker 使用 PCA 批量轉換 output 到 K-Means 數據類型錯誤

Question

我有來自 PCA 批處理轉換作業“processed_features.csv.out”的 output 文件，在 S3 中看起來像 JSON 格式{"projection":[0.248282819986343 -0.494019 -0.23275601863861]} 。 我也可以在這個位置檢索這個文件's3://path1/path2/path3/model_artifacts/pca/transform/' ，這個位置也可以通過pca_transformer.output_path檢索但是，當我嘗試使用這個文件來訓練使用作業下方代碼的 K-Means model 失敗。

job_name = "clustering-kmeans-" + strftime("%Y-%m-%d-%H-%M-%S", localtime())

image = sagemaker.image_uris.retrieve(
    framework="kmeans", region=aws_region, version="lastest"
)

kmeans_main = sagemaker.estimator.Estimator(
    image,
    role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    volume_size=50,
    output_path= model_output_path + 'kmeans/train/'+ job_name,
    sagemaker_session=sagemaker_session,
)

kmeans_main.set_hyperparameters(
    k = 8,
    feature_dim = 3,
)

kmeans_main.fit({'train': pca_transformer.output_path})

作業失敗並給了我這個錯誤信息：

ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError) Caused by: [20:15:32] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.4931.0/AL2_x86_64/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 0 in the dataset does not start with a valid magic number. Stack trace returned 10 entries: [bt] (0) /opt/amazon/lib/libaialgs.so(+0x9d1b) [0x7fb0d19acd1b] [bt] (1) /opt/amazon/lib/libaialgs.so(+0xa549) [0x7fb0d19ad549] [bt] (2) /opt/amazon/lib/libaialgs.so(aialgs::iterator_base::Next()+0x448) [0x7fb0d19ba128] [bt] (3) /opt/amazon/lib/libmxnet.so(MXDataIterNext+0x21) [0x7fb0b932c121] [bt] (4) /opt/amazon/lib/libffi.so.6(ffi_call_unix64+0x4c) [0x7fb0d19d0078] [bt] (5) /opt/amazon/lib/libffi.so.6(ffi_call+0x186) [0x7fb0d19cf206] [bt] (6) /opt/amazon/python3.7/lib/python3.7/lib-dynload

我在這個例子中找到了這個方法： https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_batch_transform/introduction_to_batch_transform/batch_transform_pca_dbscan_movie_clusters.html#Batch-prediction-PCA 。

我還讀到 PCA 結果以 application/json 或 application/x-recordioprotobuf 格式返回，帶有“投影”向量。 並且這種格式應該被 K-Means 接受。 我不確定我在這里做錯了什么。 謝謝！！

Answer 1

K-means 內置算法需要recordIO-wrapped-protobuf和CSV格式進行訓練（請參閱K-Means 算法的輸入/輸出接口）。

SageMaker 使用 PCA 批量轉換 output 到 K-Means 數據類型錯誤

問題描述

1 個解決方案

解決方案1
0 2022-02-24 20:54:09

SageMaker 使用 PCA 批量轉換 output 到 K-Means 數據類型錯誤

問題描述

1 個解決方案

解決方案1 0 2022-02-24 20:54:09

解決方案1
0 2022-02-24 20:54:09