简体   繁体   English

使用boto3从S3下载大文本文件

[英]Downloading a large text file from S3 with boto3

The analytics team of my company uploads every day a CSV file in S3, which is usually the size of 300MB, but ever-increasing. 我公司的分析团队每天在S3中上传一个CSV文件,该文件通常为300MB,但仍在不断增加。 A lambda function I have to implement needs to read this file and process each line. 我必须实现的lambda函数需要读取此文件并处理每一行。

My main concern with this is that the huge size of the file may cause memory problems in the execution context of my lambda. 我对此的主要担心是,文件的巨大尺寸可能会在我的lambda的执行上下文中导致内存问题。 Is there any way with boto3 to download this file from S3 as a stream and read it as it is being downloaded? boto3有什么方法可以从S3作为流下载此文件并在下载时读取它? If not, which approach should I follow to tackle this situation? 如果没有,我应该采取哪种方法来解决这种情况?

Thanks in advance. 提前致谢。

跟踪我的问题,我发现它是智能开放的: https : //github.com/RaRe-Technologies/smart_open/tree/master/smart_open ,它以非常优雅的方式处理了我的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM