简体   繁体   English

如何在亚马逊 lambda 上使用 python 将 csv 转换为 json?

[英]How to convert csv to json with python on amazon lambda?

I have a lambda function which attempts to take a csv file which was uploaded on a bucket, convert it to json and save it on another bucket.我有一个 lambda 函数,它尝试获取上传到存储桶上的 csv 文件,将其转换为 json 并将其保存在另一个存储桶中。 Here is my code:这是我的代码:

import json
import os
import boto3
import csv

def lambda_handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        file_key = record['s3']['object']['key']
        s3 = boto3.client('s3')
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile['Body'].read().split(b'\n')

        data = []
        csv_file = csv.DictReader(csvcontent)
        print(csv_file)
        data = list(csv_file)

        os.chdir('/tmp')
        JSON_PATH = file_key[6:] + ".json"
        print(data)
        with open(JSON_PATH, 'w') as output:
          json.dump(data, output)
          bucket_name = 'xxx'
          s3.upload_file(JSON_PATH, bucket_name, JSON_PATH)

The problem is that although when I test this locally on my machine the file can be converted to json, when I run the lambda function I get the following error:问题是,虽然当我在我的机器上本地测试这个文件时,文件可以转换为 json,但当我运行 lambda 函数时,我收到以下错误:

[ERROR] Error: iterator should return strings, not bytes (did you open the file in text mode?)
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 19, in lambda_handler
    data = list(csv_file)
  File "/var/lang/lib/python3.7/csv.py", line 111, in __next__
    self.fieldnames
  File "/var/lang/lib/python3.7/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)

Can someone help me understand why this happens?有人可以帮助我理解为什么会发生这种情况吗? I have been trying a solution since a while and I don't understand what the problem is.一段时间以来,我一直在尝试解决方案,但我不明白问题是什么。 I appreciate any help you can provide感谢您提供的任何帮助

The result of read() in s3.get_object() is bytes, not strings. s3.get_object()read()的结果是字节,而不是字符串。 The csv. DictReader() csv. DictReader() csv. DictReader() expects strings instead of bytes, and that's why it is failing. csv. DictReader()需要字符串而不是字节,这就是它失败的原因。

You can decode the result of read() into strings using the decode() function with the correct encoding.您可以使用具有正确编码的decode()函数将read()的结果解码为字符串。 The following would be a fix:以下将是一个修复:

change this改变这个

 csvcontent = csvfile['Body'].read().split(b'\\n')

to this对此

 csvcontent = csvfile['Body'].read().decode('utf-8')

A good way to debug these problems is to use the type() function to check what type your variable is.调试这些问题的一个好方法是使用type()函数来检查您的变量是什么类型。 In your case, you can easily find out the problem by trying print(type(csvcontent)) - it would show that csvcontent indeed is a byte type.在您的情况下,您可以通过尝试print(type(csvcontent))轻松找出问题 - 它会显示csvcontent确实是byte类型。

只是一个小的调整,使其正常工作:

csvcontent = csvfile['Body'].read().decode().split('\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM