[英]Convert txt file to csv with lambda s3 Aws
I have this code that must pass a file from a TXT type source bucket and must convert it to CSV in a destination bucket, it returns as a response that the variable or object (z) that should contain the CSV file cannot be opened because it is null. It seems that the code that I use is not transforming the object correctly.我有这段代码必须从 TXT 类型的源存储桶中传递一个文件,并且必须将其转换为目标存储桶中的 CSV,它作为响应返回应该包含 CSV 文件的变量或 object (z) 无法打开,因为它是 null。看来我使用的代码没有正确转换 object。 Please, I need help to correct it.
拜托,我需要帮助来纠正它。
import pandas as pd import json import boto3 from io import BytesIO导入 pandas 作为 pd 导入 json 导入 boto3 从 io 导入 BytesIO
def lambda_handler(evenBytesIOt,context): def lambda_handler(evenBytesIOt,上下文):
s3_resource = boto3.resource('s3')
source_bucket = 'testsigma2'
target_bucket = 'testsigma3'
my_bucket = s3_resource.Bucket(source_bucket)
for file in my_bucket.objects.all():
if(str(file.key).endswith('.txt')):
zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
buffer= BytesIO(zip_obj.get()['Body'].read())
dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None)
response = s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket = target_bucket,
key = f'{filename}'
)
else:
print(file.key + 'is not a zip file.')
Response
{
"errorMessage": "'NoneType' object has no attribute 'open'",
"errorType": "AttributeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n z.open(filename),\n"
]
}
It looks like you are trying to open the z object after calling the to_csv method, but the to_csv method does not return a file object. Instead, it writes the CSV data directly to the buffer object that you provided as an argument.看起来您在调用 to_csv 方法后尝试打开 z object,但 to_csv 方法不返回文件 object。相反,它将 CSV 数据直接写入您作为参数提供的缓冲区 object。 You can confirm this by calling the seek method on the buffer object after calling to_csv to reset the position of the file pointer to the beginning of the file:
您可以通过在调用 to_csv 将文件指针的 position 重置为文件开头后,在缓冲区 object 上调用 seek 方法来确认这一点:
dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None)
//Reset the position of the file pointer to the beginning of the file
buffer.seek(0)
response = s3_resource.meta.client.upload_fileobj(
buffer,
Bucket = target_bucket,
key = f'{filename}'
)
You can then use the buffer object as the file object to be uploaded to S3.然后,您可以使用缓冲区 object 作为要上传到 S3 的文件 object。
The to_csv
method on a pandas dataframe doesn't return the buffer, instead it returns none and writes to the buffer buffer
. to_csv
方法不返回缓冲区,而是返回 none 并写入缓冲区buffer
。 Thus when it returns None
and you try and open filename
it will error.因此,当它返回
None
并且您尝试打开filename
时,它将出错。 try passing the buffer
to the upload_fileobj
.尝试将
buffer
传递给upload_fileobj
。 Additionally, I don't think filename
is defined anywhere so be aware of that.此外,我不认为
filename
是在任何地方定义的,所以请注意这一点。
For documentation on the specific resources your using, check this out: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html df有关您使用的特定资源的文档,请查看: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/ v1/documentation/api/latest/guide/s3-uploading-files.html df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.