简体   繁体   中英

How to package vocabulary file for Cloud ML Engine

I have a .txt file which contains a different label on each line. I use this file to create a label index lookup file, for example:

label_index = tf.contrib.lookup.index_table_from_file(vocabulary_file = 'labels.txt'

I am wondering how I should package the vocabulary file with my cloud ml-engine? The packaging suggestions are explicit in how to set up the .py files but I am not entirely sure where I should put relevant .txt files. Should they just be hosted in a storage bucket (ie. gs://) that the engine has access to, or can they be packaged with the trainer somehow?

You have multiple options. I think the most straightforward is to store labels.txt in a GCS location.

However, if you prefer, you can also package the file up in your setup.py . There are multiple ways to do this, so I'll refer you to the official setuptools documentation .

Let me walk through a quick example:

Create a setup.py in the directory below your training package (often called trainer in CloudML Engine's samples, so I will proceed as if you're code is structured the same as the samples, including using trainer as the package). The following is based on the docs you referenced with one important change, namely, the package_data argument instead of include_package_data :

from setuptools import find_packages
from setuptools import setup

setup(
    name='my_model',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    package_data={'trainer': ['labels.txt']},
    description='My trainer application package.'
)

If you run python setup.py sdist , you can see that trainer/labels.txt was copied into the tarball.

Then in your code, you can access the file like this:

from pkg_resources import Requirement, resource_filename
resource_filename(Requirement.parse('trainer'),'labels.txt')

Note that to run this code locally, you're going to have to install your package: python setup.py install [--user] .

And that's the primary reason I think storing the file on GCS might be easier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM