从 AWS S3 中可用的 gzip 文件中读取内容

Question

Reading contents from gzip file in python dataframe which is available in AWS S3.从 python dataframe 中的 gzip 文件中读取内容，该文件在 AWS S3 中可用。

Want to convert dataframe.想转换dataframe。

Answer 1

In case if you are trying to get json data to dataframe Here is the code.如果您尝试将 json 数据转换为 dataframe 这是代码。

import pandas as pd
import boto3
from io import StringIO
import gzip
resource = boto3.resource('s3',aws_access_key_id = '',
    aws_secret_access_key = '')
    list_keys= []
    lst = []
    for key in client.list_objects(Bucket='bucket_name',Prefix = 'Folder name')['Contents']:
        list_keys.append(key["Key"])
    for key in list_keys:
        try:
            obj = resource.Object("bucket_name", key)
            with gzip.GzipFile(fileobj=obj.get()["Body"]) as gzipfile:
                temp_data = pd.read_json(StringIO(gzipfile.read().decode('UTF-8')),lines=True)
                lst.append(temp_data)
        except Exception as e:
            pass
    df = pd.concat(lst,ignore_index = True)

从 AWS S3 中可用的 gzip 文件中读取内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-11-17 11:55:16

从 AWS S3 中可用的 gzip 文件中读取内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-11-17 11:55:16

解决方案1
0 已采纳 2022-11-17 11:55:16