简体   繁体   中英

How can I use a list of files as the training set on Sagemaker with Tensorflow?

I have several million images in my training folder and want to specify a subset of them for training - the way to do this seems to be with a manifest file as described here.

https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html

But this seems to be geared towards labelled data. How can I start a sagemaker training job using sagemaker's Tensorflow estimator.fit with a list of files instead of the entire directory as input?

You can use an input type pipe parameter like so:

hyperparameters = {'save_checkpoints_secs':None,
                   'save_checkpoints_steps':1000}

tf_estimator = TensorFlow(entry_point='./my-training-file', role=role,
                          training_steps=5100, evaluation_steps=100,
                          train_instance_count=1, train_instance_type='ml.p3.2xlarge',
                          input_mode = 'Pipe',
                          train_volume_size=300, output_path = 's3://sagemaker-pocs/test-carlsoa/kepler/model',
                          framework_version = '1.12.0', hyperparameters=hyperparameters, checkpoint_path = None)

And create the manifest file pipe as an input:

train_data = sagemaker.session.s3_input('s3://sagemaker-pocs/test-carlsoa/manifest.json',
                                        distribution='FullyReplicated',
                                        content_type='image/jpeg',
                                        s3_data_type='ManifestFile',
                                        attribute_names=['source-ref']) 
                                        #attribute_names=['source-ref', 'annotations']) 
data_channels = {'train': train_data}

Note that you can use ManifestFile or AugmentedManifestFile depending on whether you have extra data or labels to provide. Now you can use data_channels as the input to the tf estimator:

tf_estimator.fit(inputs=data_channels, logs=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM