简体   繁体   English

使用 lambda s3 Aws 将 txt 文件转换为 csv

[英]Convert txt file to csv with lambda s3 Aws

I have this code that must pass a file from a TXT type source bucket and must convert it to CSV in a destination bucket, it returns as a response that the variable or object (z) that should contain the CSV file cannot be opened because it is null. It seems that the code that I use is not transforming the object correctly.我有这段代码必须从 TXT 类型的源存储桶中传递一个文件,并且必须将其转换为目标存储桶中的 CSV,它作为响应返回应该包含 CSV 文件的变量或 object (z) 无法打开,因为它是 null。看来我使用的代码没有正确转换 object。 Please, I need help to correct it.拜托,我需要帮助来纠正它。

  • The code is the following:代码如下:

import pandas as pd import json import boto3 from io import BytesIO导入 pandas 作为 pd 导入 json 导入 boto3 从 io 导入 BytesIO

def lambda_handler(evenBytesIOt,context): def lambda_handler(evenBytesIOt,上下文):

s3_resource = boto3.resource('s3')
source_bucket = 'testsigma2'
target_bucket = 'testsigma3'

my_bucket = s3_resource.Bucket(source_bucket)

for file in my_bucket.objects.all():
    if(str(file.key).endswith('.txt')):
        
       zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
       
       buffer= BytesIO(zip_obj.get()['Body'].read())
       
       dataframe1=pd.read_csv(buffer)
       z = dataframe1.to_csv(buffer,index=None) 
       
       response = s3_resource.meta.client.upload_fileobj(
                    z.open(filename),
                    Bucket = target_bucket,
                    key = f'{filename}'

                )

    else:
        print(file.key + 'is not a zip file.')
Response
{
  "errorMessage": "'NoneType' object has no attribute 'open'",
  "errorType": "AttributeError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n    z.open(filename),\n"
  ]
}

It looks like you are trying to open the z object after calling the to_csv method, but the to_csv method does not return a file object. Instead, it writes the CSV data directly to the buffer object that you provided as an argument.看起来您在调用 to_csv 方法后尝试打开 z object,但 to_csv 方法不返回文件 object。相反,它将 CSV 数据直接写入您作为参数提供的缓冲区 object。 You can confirm this by calling the seek method on the buffer object after calling to_csv to reset the position of the file pointer to the beginning of the file:您可以通过在调用 to_csv 将文件指针的 position 重置为文件开头后,在缓冲区 object 上调用 seek 方法来确认这一点:

dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None) 

//Reset the position of the file pointer to the beginning of the file
buffer.seek(0)

response = s3_resource.meta.client.upload_fileobj(
             buffer,
             Bucket = target_bucket,
             key = f'{filename}'
          )

You can then use the buffer object as the file object to be uploaded to S3.然后,您可以使用缓冲区 object 作为要上传到 S3 的文件 object。

The to_csv method on a pandas dataframe doesn't return the buffer, instead it returns none and writes to the buffer buffer . to_csv方法不返回缓冲区,而是返回 none 并写入缓冲区buffer Thus when it returns None and you try and open filename it will error.因此,当它返回None并且您尝试打开filename时,它将出错。 try passing the buffer to the upload_fileobj .尝试将buffer传递给upload_fileobj Additionally, I don't think filename is defined anywhere so be aware of that.此外,我不认为filename是在任何地方定义的,所以请注意这一点。

For documentation on the specific resources your using, check this out: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html df有关您使用的特定资源的文档,请查看: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/ v1/documentation/api/latest/guide/s3-uploading-files.html df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM