简体   繁体   English

将对象坚持到本地文件系统或S3

[英]Persist an object to either local file system or to S3

I need a method which persists an object (model) to either local file system or to an S3 bucket. 我需要一种将对象(模型)持久保存到本地文件系统或S3存储桶的方法。 The destination is determined by the environment variable MODELS_DIR . 目的地由环境变量MODELS_DIR确定。 I have two versions where the first is a little longer and I am rather confident about its correctness. 我有两个版本,第一个版本更长一些,我对它的正确性很有信心。 The second version is shorter, but I am worried that not using the with statement is actually wrong. 第二个版本较短,但我担心不使用with语句实际上是错误的。

def persist_model(model, model_name):
    """ VERSION 1
    Persist `model` under the name `model_name` to the environment variable
    `MODELS_DIR` (having a trailing '/').
    """
    MODELS_DIR = os.getenv('MODELS_DIR')

    if MODELS_DIR.startswith('s3://'):
        s3 = s3fs.S3FileSystem()
        with s3.open(MODELS_DIR[5:] + model_name, 'wb') as f:
            joblib.dump(model, f)
    else:
        with open(MODELS_DIR + model_name, 'wb') as f:
            joblib.dump(model, f)

and: 和:

def persist_model(model, model_name):
    """VERSION 2
    Persist `model` under the name `model_name` to the environment variable
    `MODELS_DIR` (having a trailing '/').
    """
    MODELS_DIR = os.getenv('MODELS_DIR')

    if MODELS_DIR.startswith('s3://'):
        s3 = s3fs.S3FileSystem()
        f = s3.open(MODELS_DIR[5:] + model_name, 'wb')
    else:
        f = open(MODELS_DIR + model_name, 'wb')

    joblib.dump(model, f)

My question is whether the second version is safe to use? 我的问题是第二版本是否可以安全使用?

Yes an no... normally you would need to close the file after you wrote (dumped) your content in it. 是的,不。。。通常,您在写入(转储)内容后需要关闭该文件。 When you use the with statement, Python will take care for that. 当您使用with语句时,Python会注意这一点。 If you skip the with, you need to use f.close() or s.close() . 如果跳过with,则需要使用f.close()s.close()

Also, to be sure that the file gets closed even in case of an error, you would need to use the try-finally construct. 另外,为确保即使在发生错误的情况下文件也被关闭,您将需要使用try-finally构造。 Therefore the second version, when used correctly, would become way longer than the first one. 因此,如果正确使用第二个版本,它将比第一个更长。

In case you would like to avoid the code duplication, I would propose the use of a function selector: 为了避免代码重复,我建议使用函数选择器:

 def persist_model(model, model_name):

     def get_file_opener(path):
        if path.startswith('s3://'):
            return s3fs.S3FileSystem().open
        else 
            return open

     full_path = os.getenv('MODELS_DIR')
     with get_file_opener(fullpath)(fullpath[5:] + model_name, 'wb') as f:
        joblib.dump(model, f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM