Tensorflow Model Transformer 在 AWS Sagemaker Workflow Pipeline 中的先前训练步骤属性的 model_data 上给出错误

Question

我正在尝试设置一个 AWS Sagemaker Pipeline 来训练 tensorflow model，然后，一旦通过了适当的验收标准，将运行批量转换步骤。 这一切都基于Data Science On Aws小组提供的教程，尽管我已经大量修改了代码（这一步不在他们的原始代码中）。

以下是相关代码的一部分：

...

    training_step = TrainingStep(
        name="Train",
        estimator=estimator,
        inputs={
            "train": TrainingInput(
                s3_data=processing_step.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
                content_type="text/csv",
            ),
            "validation": TrainingInput(
                s3_data=processing_step.properties.ProcessingOutputConfig.Outputs["validation"].S3Output.S3Uri,
                content_type="text/csv",
            ),
            "test": TrainingInput(
                s3_data=processing_step.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
                content_type="text/csv",
            ),
        },
        cache_config=cache_config,
    )

...

    inference_image_uri = sagemaker.image_uris.retrieve(
        framework="tensorflow",  # todo: edit
        region=region,
        version="2.3.1",
        py_version="py37",
        instance_type=deploy_instance_type,
        image_scope="inference",
    )
    print('Inference image uri: ', inference_image_uri)

...

    # Create Model for Deployment Step
    model = Model(
        name=model_name,
        image_uri=inference_image_uri,
        model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
        sagemaker_session=sess,
        role=role,
    )

    create_inputs = CreateModelInput(
        instance_type=deploy_instance_type,
    )

    create_step = CreateModelStep(
        name=model_name,
        model=model,
        inputs=create_inputs,
    )

    # Transform Step for batch transform
    batch_env = {
        # Configures whether to enable record batching.
        'SAGEMAKER_TFS_ENABLE_BATCHING': 'true',

        # Name of the model - this is important in multi-model deployments
        'SAGEMAKER_TFS_DEFAULT_MODEL_NAME': 'saved_model',

        # Configures how long to wait for a full batch, in microseconds.
        'SAGEMAKER_TFS_BATCH_TIMEOUT_MICROS': '50000', # microseconds

        # Corresponds to "max_batch_size" in TensorFlow Serving.
        'SAGEMAKER_TFS_MAX_BATCH_SIZE': '10000',

        # Number of seconds for the SageMaker web server timeout
        'SAGEMAKER_MODEL_SERVER_TIMEOUT': '7200', # Seconds

        # Configures number of batches that can be enqueued.
        'SAGEMAKER_TFS_MAX_ENQUEUED_BATCHES': '10000'
    }

    batch_transformer = model.transformer(
        instance_type=deploy_instance_type.default_value,
        instance_count=deploy_instance_count.default_value,
        output_path=f"{raw_input_data_s3_uri}output/",
        strategy='MultiRecord',
        env=batch_env,
        assemble_with='Line',
        accept='text/csv',
        max_concurrent_transforms=1,
        max_payload=1,  # This is in Megabytes (not number of records)
    )

    transform_inputs = TransformInput(
        data=raw_input_data_s3_uri,
        data_type='S3Prefix',
        content_type='application/json',
        split_type='Line',
        compression_type='Gzip',
    )

    transform_step = TransformStep(
        name=create_step.name,
        transformer=batch_transformer,
        cache_config=cache_config,
        inputs=transform_inputs
    )

...

这是我在尝试运行model.transformer行时遇到的错误：

sgmkr_1  | Object of type 'Properties' is not JSON serializable: TypeError
sgmkr_1  | Traceback (most recent call last):
sgmkr_1  |   File "/var/task/lambda_function.py", line 53, in lambda_handler
sgmkr_1  |     model_package_group_name=model_package_group_name
sgmkr_1  |   File "/var/task/pipeline_definition_template.py", line 722, in get_pipeline
sgmkr_1  |     max_payload=1,  # This is in Megabytes (not number of records)
sgmkr_1  |   File "/var/lang/lib/python3.6/site-packages/sagemaker/model.py", line 842, in transformer
sgmkr_1  |     self._create_sagemaker_model(instance_type, tags=tags)
sgmkr_1  |   File "/var/lang/lib/python3.6/site-packages/sagemaker/model.py", line 331, in _create_sagemaker_model
sgmkr_1  |     tags=tags,
sgmkr_1  |   File "/var/lang/lib/python3.6/site-packages/sagemaker/session.py", line 2530, in create_model
sgmkr_1  |     LOGGER.debug("CreateModel request: %s", json.dumps(create_model_request, indent=4))
sgmkr_1  |   File "/var/lang/lib/python3.6/json/__init__.py", line 238, in dumps
sgmkr_1  |     **kw).encode(obj)
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 201, in encode
sgmkr_1  |     chunks = list(chunks)
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 430, in _iterencode
sgmkr_1  |     yield from _iterencode_dict(o, _current_indent_level)
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
sgmkr_1  |     yield from chunks
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
sgmkr_1  |     yield from chunks
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 437, in _iterencode
sgmkr_1  |     o = _default(o)
sgmkr_1  |   File "/var/lang/lib/python3.6/json/encoder.py", line 180, in default
sgmkr_1  |     o.__class__.__name__)
sgmkr_1  | TypeError: Object of type 'Properties' is not JSON serializable

似乎它正在尝试记录它无法执行的创建 model 请求的定义，因为它在定义中有 class object。 由于日志记录对整个代码来说可能并不重要，我想知道这是否只是一个错误，或者我是否可以采取一些措施来弥补它。

Answer 1

问题是我设置变压器的方式。 我假设我需要通过调用 model 本身的转换方法来创建依赖关系，而我只需要引用在转换器的“create_step”中创建的 model 的名称。 所以代替这个：

    batch_transformer = model.transformer(
        instance_type=deploy_instance_type.default_value,
        instance_count=deploy_instance_count.default_value,
        output_path=f"{raw_input_data_s3_uri}output/",
        strategy='MultiRecord',
        env=batch_env,
        assemble_with='Line',
        accept='text/csv',
        max_concurrent_transforms=1,
        max_payload=1,  # This is in Megabytes (not number of records)
    )

我需要这个：

    batch_transformer = Transformer(
        model_name=create_step.properties.ModelName,
        instance_type=deploy_instance_type.default_value,
        instance_count=deploy_instance_count.default_value,
        output_path=f"{raw_input_data_s3_uri}output/",
        strategy='MultiRecord',
        env=batch_env,
        assemble_with='Line',
        accept='text/csv',
        max_concurrent_transforms=1,
        max_payload=1,  # This is in Megabytes (not number of records)
    )

Tensorflow Model Transformer 在 AWS Sagemaker Workflow Pipeline 中的先前训练步骤属性的 model_data 上给出错误

问题描述

1 个解决方案

解决方案1
0 2021-04-02 18:58:40

Tensorflow Model Transformer 在 AWS Sagemaker Workflow Pipeline 中的先前训练步骤属性的 model_data 上给出错误

问题描述

1 个解决方案

解决方案1 0 2021-04-02 18:58:40

解决方案1
0 2021-04-02 18:58:40