简体   繁体   English

如何在Sagemaker的XGBoost培训工作中用Python指定content_type?

[英]How can I specify content_type in a training job of XGBoost from Sagemaker in Python?

I am trying to train a model using the sagemaker library. 我正在尝试使用Sagemaker库训练模型。 So far, my code is the following: 到目前为止,我的代码如下:

container = get_image_uri(boto3.Session().region_name,
                      'xgboost', 
                      repo_version='0.90-1')

estimator = sagemaker.estimator.Estimator(container, 
                                          role = 'AmazonSageMaker-ExecutionRole-20190305TXXX',
                                          train_instance_count = 1,
                                          train_instance_type = 'ml.m4.2xlarge',
                                          output_path = 's3://antifraud/production/',
                                          hyperparameters = {'num_rounds':'400',
                                                             'objective':'binary:logistic',
                                                             'eval_metric':'error@0.1'})

train_config = training_config(estimator=estimator,
                               inputs = {'train':'s3://antifraud/production/train',
                                         'validation':'s3://-antifraud/production/validation'})

And I get an error parsing the hyperparameters. 而且我在解析超参数时遇到错误。 This commands gives me a configuration JSON output in the console. 此命令为我在控制台中提供了配置JSON输出。 I have been able to run a training job using boto3 with the configuration as Json, so I have figured out that the thing I am missing in my json configuration generated by my code is the content_type parameter, which should be there as follow: 我已经能够使用配置为Json的boto3来运行培训工作,因此我发现我的代码生成的json配置中缺少的是content_type参数,该参数应如下所示:

"InputDataConfig": [
    {
        "ChannelName": "train",
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": "s3://antifraud/production/data/train",
                "S3DataDistributionType": "FullyReplicated" 
            }
        },
        "ContentType": "text/csv",
        "CompressionType": "None"
    },
    {
        "ChannelName": "validation",
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": "s3://antifraud/production/validation",
                "S3DataDistributionType": "FullyReplicated"
            }
        },
        "ContentType": "text/csv",
        "CompressionType": "None"
    }
]

I have tried coding content_type = 'text/csv' in container, estimator and train_config as parameter and also inside inputs as another key of the dictionary, with no success. 我曾尝试在容器,estimator和train_config中将content_type ='text / csv'编码为参数,并在输入内部将其编码为字典的另一个键,但没有成功。 How could I make this work? 我该如何进行这项工作?

I have solved it using s3_input objects: 我已经使用s3_input对象解决了它:

s3_input_train = sagemaker.s3_input(s3_data='s3://antifraud/production/data/{domain}-{product}-{today}/train_data.csv',
content_type='text/csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://antifraud/production/data/{domain}-{product}-{today}/validation_data.csv',
content_type='text/csv')

train_config = training_config(estimator=estimator,
inputs = {'train':s3_input_train,
          'validation':s3_input_validation})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sagemaker:如何在 Predictor(Sagemake > 2.0)中设置 content_type? - Sagemaker : How do I set content_type in Predictor (Sagemake > 2.0)? Sagemaker Blazingtext.content_type = 'application/json' AttributeError: 无法设置属性 - Sagemaker Blazingtext .content_type = 'application/json' AttributeError: can't set attribute 如何在使用 aws Sagemaker python SDK 时保存训练作业的未压缩输出? - how to save uncompressed outputs from a training job in using aws Sagemaker python SDK? 如何为 xgboost 实施增量训练? - How can I implement incremental training for xgboost? 如何从 Django 模型类获取 content_type? - How to obtain content_type from a Django Model Class? 您何时在 SageMaker 培训作业中指定目标变量? - When do you specify the Target variable in a SageMaker Training job? 如何通过content_type ID获取用户 - How to get user by content_type ID 如何使用 Boto3 SDK 为 SageMaker 训练作业指定源目录和入口点? 用例是通过 Lambda 调用开始训练 - How to specify source directory and entry point for a SageMaker training job using Boto3 SDK? The use case is start training via Lambda call Sagemaker 中的培训工作正在停止 - Training Job is Stopping in Sagemaker 如何使用文件列表作为 Sagemaker 上的训练集和 Tensorflow? - How can I use a list of files as the training set on Sagemaker with Tensorflow?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM