[英]SageMaker using PCA batch transform output to K-Means data type error
我有來自 PCA 批處理轉換作業“processed_features.csv.out”的 output 文件,在 S3 中看起來像 JSON 格式{"projection":[0.248282819986343 -0.494019 -0.23275601863861]}
。 我也可以在這個位置檢索這個文件's3://path1/path2/path3/model_artifacts/pca/transform/'
,這個位置也可以通過pca_transformer.output_path
檢索但是,當我嘗試使用這個文件來訓練使用作業下方代碼的 K-Means model 失敗。
job_name = "clustering-kmeans-" + strftime("%Y-%m-%d-%H-%M-%S", localtime())
image = sagemaker.image_uris.retrieve(
framework="kmeans", region=aws_region, version="lastest"
)
kmeans_main = sagemaker.estimator.Estimator(
image,
role,
instance_count=1,
instance_type="ml.m4.xlarge",
volume_size=50,
output_path= model_output_path + 'kmeans/train/'+ job_name,
sagemaker_session=sagemaker_session,
)
kmeans_main.set_hyperparameters(
k = 8,
feature_dim = 3,
)
kmeans_main.fit({'train': pca_transformer.output_path})
作業失敗並給了我這個錯誤信息:
ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError) Caused by: [20:15:32] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.4931.0/AL2_x86_64/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 0 in the dataset does not start with a valid magic number. Stack trace returned 10 entries: [bt] (0) /opt/amazon/lib/libaialgs.so(+0x9d1b) [0x7fb0d19acd1b] [bt] (1) /opt/amazon/lib/libaialgs.so(+0xa549) [0x7fb0d19ad549] [bt] (2) /opt/amazon/lib/libaialgs.so(aialgs::iterator_base::Next()+0x448) [0x7fb0d19ba128] [bt] (3) /opt/amazon/lib/libmxnet.so(MXDataIterNext+0x21) [0x7fb0b932c121] [bt] (4) /opt/amazon/lib/libffi.so.6(ffi_call_unix64+0x4c) [0x7fb0d19d0078] [bt] (5) /opt/amazon/lib/libffi.so.6(ffi_call+0x186) [0x7fb0d19cf206] [bt] (6) /opt/amazon/python3.7/lib/python3.7/lib-dynload
我在這個例子中找到了這個方法: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_batch_transform/introduction_to_batch_transform/batch_transform_pca_dbscan_movie_clusters.html#Batch-prediction-PCA 。
我還讀到 PCA 結果以 application/json 或 application/x-recordioprotobuf 格式返回,帶有“投影”向量。 並且這種格式應該被 K-Means 接受。 我不確定我在這里做錯了什么。 謝謝!!
K-means 內置算法需要recordIO-wrapped-protobuf
和CSV
格式進行訓練(請參閱K-Means 算法的輸入/輸出接口)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.