[英]Python, AWS S3: how to read file with jsons
我在 S3 存儲桶中有一個 JSONs 文件(每行的文件 - json)。 我在正確閱讀它們時遇到了麻煩。我在做什么:
s3 = boto3.client('s3')
response = s3.get_object(Bucket=SOURCE_BUCKET, Key=key)
file = response['Body']
for line in file:
data_json = json.loads(line, encoding='utf-8')
在這種情況下,它會忽略\n
並將一堆文本讀取為一行。
如何正確讀取文件中每一行的所有 json?
輸入文件內容示例(一個包含 json 數作為單獨行的文件):
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A001US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A002US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
...
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A003US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
boto3 的get_object
返回一個StreamingBody object 作為返回字典Body
的值。
object 的方法之一是iter_lines
方法,它允許您在讀取響應時遍歷響應的行。 您可以從那里在每一行上調用json.loads
:
for line in file.iter_lines():
data = json.loads(line)
print(data)
獲取 object 返回 aws botocore.response.StreamingBody
如果您的 function 不能使用原始字節 stream,您需要執行.read()
(請參閱本文檔)
response = s3.get_object(Bucket=SOURCE, Key=key)['body'].read()
for line in response:
json_data = json.loads(line)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.