简体   繁体   中英

Implement an optional context manager in Python

We have a codebase with the following use pattern:

factory = DataFactory(args)
dataset = factory.download_and_cache_big_dataset(key)
metadata = dataset.get_some_metadata()

Currently, download_and_cache_big_dataset fetches a very large file from S3 and puts it somewhere. Among other things, it does

filename = get_s3_key(key)
filepath = os.path.join(get_tmp_dir(), filename)
s3.download_file(key, filepath)
return BigFileClass(filepath) # gets stored in a class somewhere

However, this file doesn't get deleted. This is fine when this function is called sparingly and relies on file caching, but bad when it is called repeatedly and we don't want to fill up the disk. Is there a way to refactor the code with a context manager such that we can use it as

factory = DataFactory(args)
with factory.download_and_cache_big_dataset(key) as dataset:
    metadata = dataset.get_some_metadata()
    # do something with metadata

# file gets automatically deleted

But critically, without breaking the existing usage , so that the other code works as is? Or will there need to be a different method that returns the context manager?

Since you return an instance of BigFileClass to handle/represent the data, I would suggest the following.

I'm assuming that the data file is unique to each instance.

  • Add an instance variable to BigFileClass to keep track of the path of the data file.
  • Add a __del__ method to BigFileClass in which the data file is removed.

Edit: If you want to use BigFileClass as a contextmanager, define __enter__ and __exit__ methods for BigFileClass . The only thing that __enter__ has to do in this case is basically return self .

I would leave the task of deleting the file to the __del__ method (when the reference count for a BigFileClass reaches 0). It doesn't feel right to have the class instance still around when you have already deleted the data file.


Remark wrt architecture.

The use of a factory seems like an unnecessary complication to me. IMO, download_and_cache_big_dataset could just be a function returning a BigFileClass instance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM