简体   繁体   English

如何在luigi中将模型pickle文件输出到s3?

[英]How to output model pickle file to s3 in luigi?

I have a task which trains the model eg: 我有一个训练模型的任务,例如:

class ModelTrain(luigi.Task):
    def output(self):
        client = S3Client(os.getenv("CONFIG_AWS_ACCESS_KEY"),
                          os.getenv("CONFIG_AWS_SECRET_KEY"))
        model_output = os.path.join(
            "s3://", _BUCKET, exp.version + '_model.joblib')
        return S3Target(model_output, client) 

    def run(self):
        joblib.dump(model, '/tmp/model.joblib')
    with open(self.output().path, 'wb') as out_file:
        out_file.write(joblib.load('/tmp/model.joblib'))        

FileNotFoundError: [Errno 2] No such file or directory: 's3://bucket/version_model.joblib' FileNotFoundError:[错误2]没有这样的文件或目录:'s3://bucket/version_model.joblib'

Any pointers in this regard would be helpful 这方面的任何指示都将有所帮助

Could you try to remove .path in your open statement. 您能否尝试在打开的语句中删除.path。

  def run(self):
    joblib.dump(model, '/tmp/model.joblib')
    with open(self.output(), 'wb') as out_file:
        out_file.write(joblib.load('/tmp/model.joblib'))

A few suggestions- 一些建议-

First, make sure you're using the actual self.output().open() method instead of wrapping open(self.output().path) . 首先,请确保您使用的是实际的self.output().open()方法,而不是包装open(self.output().path) This loses the 'atomicity' of the luigi targets, plus those targets are supposed to be swappable, so if you changed back to aa LocalTarget your code should work the same way. 这失去了luigi目标的“原子性”,而且这些目标应该是可交换的,因此,如果您改回一个LocalTarget代码应以相同的方式工作。 You let the specific target class handle what it means to open the file. 您让特定的目标类处理打开文件的含义。 The error you get looks like python is trying to find a local path, which obviously doesn't work. 您收到的错误似乎是python正在尝试查找本地路径,这显然不起作用。

Second, I just ran into the same issue, so here's my solution plugged into this code: 其次,我遇到了同样的问题,因此这是插入此代码的解决方案:

from luigi import format

class ModelTrain(luigi.Task):
    def output(self):
        client = S3Client(os.getenv("CONFIG_AWS_ACCESS_KEY"),
                          os.getenv("CONFIG_AWS_SECRET_KEY"))
        model_output = os.path.join(
            "s3://", _BUCKET, exp.version + '_model.joblib')
        # Use luigi.format.Nop for binary files
        return S3Target(model_output, client, format=format.Nop) 

    def run(self):
        # where does `model` come from?
        with self.output().open('w') as s3_f:
            joblib.dump(model, s3_f)

My task is using pickle so I had to follow something similar to this post to re-import. 我的任务是使用pickle所以我必须遵循类似于此帖子的内容重新导入。

class MyNextTask(Task):
    ...

    def run(self):
        with my_pickled_task.output().open() as f:
            # The S3Target implements a read method and then I can use
            # the `.loads()` method to import from a binary string
            results = pickle.loads(f.read())

        ... do more stuff with results ...

I recognize this post is stale, but putting the solution I found out there for the next poor soul trying to do this same thing. 我知道这篇文章是过时的,但请提出我发现的解决方案,以帮助下一个可怜的灵魂尝试执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM