简体   繁体   English

单元测试:使用 Python 模拟运行“aws s3 同步”的子进程

[英]Unit-testing: Mocking a subprocess running "aws s3 sync" with Python

My project needs to download quite a few files regularly before doing treatment on them.我的项目需要定期下载不少文件,然后才能对它们进行处理。 I tried coding it directly in Python but it's horribly slow considering the amount of data in the buckets.我尝试直接在 Python 中对其进行编码,但考虑到存储桶中的数据量,它的速度非常慢。

I decided to use a subprocess running aws-cli because boto3 still doesn't have a sync functionality.我决定使用运行aws-cli的子boto3因为boto3仍然没有同步功能。 I know using a subprocess with aws-cli is not ideal, but it really is useful and works extremely well out of the box.我知道在aws-cli使用子流程并不理想,但它确实很有用,并且开箱即用。

One of the perks of aws-cli is the fact that I can see the progress in stdout , which I am getting with the following code: aws-cliaws-cli是我可以看到stdout的进度,我通过以下代码获得:

def download_bucket(bucket_url, dir_name, dest):
"""Download all the files from a bucket into a directory."""
path = Path(dest) / dir_name
bucket_dest = str(os.path.join(bucket_url, dir_name))
with subprocess.Popen(["aws", "s3", "sync", bucket_dest, path], stdout=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
    for b in p.stdout:
        print(b, end='')

if p.returncode != 0:
    raise subprocess.CalledProcessError(p.returncode, p.args)

Now, I want to make sure that I test this function but I am blocked here because:现在,我想确保我测试了这个功能,但我在这里被阻止了,因为:

  1. I don't know the best way to test this kind of freakish behavior:我不知道测试这种怪异行为的最佳方法:
    • Am I supposed to actually create a fake local s3 bucket so that aws s3 sync can hit it?我真的应该创建一个假的本地 s3 存储桶,以便aws s3 sync可以命中它吗?
    • Am I supposed to mock the subprocess call and not actually call my download_bucket function?我应该模拟子进程调用而不是实际调用我的download_bucket函数吗?

Until now, my attempt was to create a fake bucket and to pass it to my download_bucket function.到目前为止,我的尝试是创建一个假存储桶并将其传递给我的 download_bucket 函数。 This way, I thought that aws s3 sync would still be working, albeit locally:这样,我认为aws s3 sync仍然可以工作,尽管是在本地:

def test_download_s3(tmpdir):
tmpdir.join(f'frankendir').ensure()
with mock_s3():
    conn = boto3.resource('s3', region_name='us-east-1')
    conn.create_bucket(Bucket='cool-bucket.us-east-1.dev.000000000000')

    s3 = boto3.client('s3', region_name="us-east-1")
    s3.put_object(Bucket='cool-bucket.us-east-1.dev.000000000000', Key='frankendir', Body='has no files')

    body = conn.Object('cool-bucket.us-east-1.dev.000000000000', 'frankendir').get()[
        'Body'].read().decode("utf-8")

    download_bucket('s3://cool-bucket.us-east-1.dev.000000000000', 'frankendir', tmpdir)

    #assert tmpdir.join('frankendir').join('has not files').exists()
    assert body == 'has no files'

But I get the following error fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.但我收到以下错误fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.

My questions are the following:我的问题如下:

  1. Should I continue to pursue this creation of a fake local s3 bucket?我应该继续创建一个假的本地 s3 存储桶吗?
    • If so, how am I supposed to get the credentials to work?如果是这样,我应该如何使凭据起作用?
  2. Should I just mock the subprocess call and how?我应该只是模拟子进程调用吗?
    • I am having a hard time understanding how mocking works and how it's supposed to be done.我很难理解模拟是如何工作的以及它应该如何完成。 From my understanding, I would just fake a call to aws s3 sync and return some files?根据我的理解,我只会假调用aws s3 sync并返回一些文件?
  3. Is there another kind of unit test that would be enough that I didn't think of?是否还有另一种我没有想到的单元测试就足够了?
    • After all, I just want to know if when I transmit a well-formed s3://bucketurl , a dir in that bucket and a local dir , the files contained within the s3://bucketurl/dir are downloaded to my local dir .毕竟,我只想知道当我传输格式正确的s3://bucketurl 、该存储桶中的dirlocal dirs3://bucketurl/dir中包含的文件是否会下载到我的local dir .

Thank you for your help, I hope that I am not all over the place.谢谢你的帮助,我希望我不是到处都是。

A much better approach is to use moto when faking / testing s3.更好的方法是在伪造/测试 s3 时使用moto You can check out their documentation or look at a test code example I did: https://github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py .您可以查看他们的文档或查看我所做的测试代码示例: https : //github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py

If you have a few minutes, you can view this short video of me explaining the benefits of using moto vs trying to mock.如果你有几分钟的时间,你可以观看我解释使用 moto 与尝试模拟的好处的这个简短视频

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM