[英]Unit-testing: Mocking a subprocess running "aws s3 sync" with Python
My project needs to download quite a few files regularly before doing treatment on them.我的项目需要定期下载不少文件,然后才能对它们进行处理。 I tried coding it directly in Python but it's horribly slow considering the amount of data in the buckets.
我尝试直接在 Python 中对其进行编码,但考虑到存储桶中的数据量,它的速度非常慢。
I decided to use a subprocess running aws-cli
because boto3
still doesn't have a sync functionality.我决定使用运行
aws-cli
的子boto3
因为boto3
仍然没有同步功能。 I know using a subprocess with aws-cli
is not ideal, but it really is useful and works extremely well out of the box.我知道在
aws-cli
使用子流程并不理想,但它确实很有用,并且开箱即用。
One of the perks of aws-cli
is the fact that I can see the progress in stdout
, which I am getting with the following code: aws-cli
的aws-cli
是我可以看到stdout
的进度,我通过以下代码获得:
def download_bucket(bucket_url, dir_name, dest):
"""Download all the files from a bucket into a directory."""
path = Path(dest) / dir_name
bucket_dest = str(os.path.join(bucket_url, dir_name))
with subprocess.Popen(["aws", "s3", "sync", bucket_dest, path], stdout=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
for b in p.stdout:
print(b, end='')
if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, p.args)
Now, I want to make sure that I test this function but I am blocked here because:现在,我想确保我测试了这个功能,但我在这里被阻止了,因为:
aws s3 sync
can hit it?aws s3 sync
可以命中它吗?download_bucket
function?download_bucket
函数吗? Until now, my attempt was to create a fake bucket and to pass it to my download_bucket function.到目前为止,我的尝试是创建一个假存储桶并将其传递给我的 download_bucket 函数。 This way, I thought that
aws s3 sync
would still be working, albeit locally:这样,我认为
aws s3 sync
仍然可以工作,尽管是在本地:
def test_download_s3(tmpdir):
tmpdir.join(f'frankendir').ensure()
with mock_s3():
conn = boto3.resource('s3', region_name='us-east-1')
conn.create_bucket(Bucket='cool-bucket.us-east-1.dev.000000000000')
s3 = boto3.client('s3', region_name="us-east-1")
s3.put_object(Bucket='cool-bucket.us-east-1.dev.000000000000', Key='frankendir', Body='has no files')
body = conn.Object('cool-bucket.us-east-1.dev.000000000000', 'frankendir').get()[
'Body'].read().decode("utf-8")
download_bucket('s3://cool-bucket.us-east-1.dev.000000000000', 'frankendir', tmpdir)
#assert tmpdir.join('frankendir').join('has not files').exists()
assert body == 'has no files'
But I get the following error fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
但我收到以下错误
fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
My questions are the following:我的问题如下:
aws s3 sync
and return some files?aws s3 sync
并返回一些文件?s3://bucketurl
, a dir
in that bucket and a local dir
, the files contained within the s3://bucketurl/dir
are downloaded to my local dir
.s3://bucketurl
、该存储桶中的dir
和local dir
, s3://bucketurl/dir
中包含的文件是否会下载到我的local dir
. Thank you for your help, I hope that I am not all over the place.谢谢你的帮助,我希望我不是到处都是。
A much better approach is to use moto when faking / testing s3.更好的方法是在伪造/测试 s3 时使用moto 。 You can check out their documentation or look at a test code example I did: https://github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py .
您可以查看他们的文档或查看我所做的测试代码示例: https : //github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py 。
If you have a few minutes, you can view this short video of me explaining the benefits of using moto vs trying to mock.如果你有几分钟的时间,你可以观看我解释使用 moto 与尝试模拟的好处的这个简短视频。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.