简体   繁体   English

boto3是否可以使用readlines?

[英]Is it possible to use readlines with boto3?

I'm trying to run a diff on two files that are stored in S3, and would like to avoid downloading the files if possible. 我正在尝试对S3中存储的两个文件运行diff,并希望尽可能避免下载文件。

The sample code I am working with is as so: 我正在使用的示例代码是这样的:

import difflib

file1 = open('sample1.csv', 'r');
file2 = open('sample2.csv', 'r');

diff = difflib.ndiff(file1.readlines(), file2.readlines())

I see with boto3 package, I can open the file from S3, but how can I pass the equivalent of file1.readlines() and file2.readlines() into the ndiff function? 我看到带有boto3包,可以从S3打开文件,但是如何将等效的file1.readlines()和file2.readlines()传递给ndiff函数呢?

For future readers, I'll answer the exact question "Is it possible to use readlines with boto3?" 对于将来的读者,我将回答确切的问题“是否可以在boto3中使用阅读行?”

import io

// import stuff and set up s3_client

body = s3_client.get_object(Bucket=bucket, Key=key)['Body']
stream = io.BufferedReader(body._raw_stream)
stream.readlines()

As indicated by comments on the question, readlines() pulls everything into memory, which is why you can pass a hint to it so "no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint." 正如对该问题的评论所指出的那样,readlines()将所有内容都拉到内存中,这就是为什么您可以向其传递提示,以便“如果到目前为止所有行的总大小(以字节/字符为单位)都超出,将不再读取行暗示。” ( https://docs.python.org/2/library/io.html#io.IOBase.readlines ) https://docs.python.org/2/library/io.html#io.IOBase.readlines

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM