简体   繁体   English

EmptyDataError:从 S3 存储桶读取多个 csv 文件到 Pandas Dataframe 时,没有要从文件解析的列

[英]EmptyDataError: No columns to parse from file when reading multiple csv files from S3 bucket to pandas Dataframe

I have a source s3 bucket which has around 500 csv files, I want to move those files to another s3 bucket and before moving I want to clean up the data so I am trying to read it to a pandas dataframe.我有一个源 s3 存储桶,其中包含大约 500 个 csv 文件,我想将这些文件移动到另一个 s3 存储桶,在移动之前我想清理数据,因此我试图将其读取到 Pandas 数据帧。 My code works fine and returns dataframes for a few files and then it suddenly breaks and gives me the error " EmptyDataError: No columns to parse from file " .我的代码工作正常并返回几个文件的数据帧,然后它突然中断并给我错误“ EmptyDataError: No columns to parse from file ”。

sts_client = boto3.client('sts', region_name='us-east-1')
client = boto3.client('s3')

bucket = 'source bucket'
folder_path = 'mypath'

def get_keys(bucket,folder_path):
    keys = []
    resp = client.list_objects(Bucket=bucket, Prefix=folder_path)
    for obj in resp['Contents']:
        keys.append(obj['Key'])
    return keys

files = get_keys(bucket,folder_path)
print(files)

for file in files:
    f = BytesIO()
    client.download_fileobj(bucket, file, f)
    f.seek(0)
    obj = f.getvalue()
    my_df = pd.read_csv(f ,header=None, escapechar='\\', encoding='utf-8', engine='python')
    # files dont have column names, providing column names
    my_df.columns = ['col1', 'col2','col3','col4','col5']
    print(my_df.head())

Thanks in advance!提前致谢!

Your file size is zero.您的文件大小为零。 Instead of os.path.getsize(file) use paginator to check as follows:而不是 os.path.getsize(file) 使用分页器检查如下:

import boto3

client = boto3.client('s3', region_name='us-west-2')
paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket='my-bucket')
filtered_iterator = page_iterator.search("Contents[?Size > `0`][]")
for key_data in filtered_iterator:
    print(key_data)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas - EmptyDataError:读取库存 .csv 文件时没有要从文件中解析的列 - Pandas - EmptyDataError: No columns to parse from file when reading stock .csv file 将目录中的所有文件连接到单个 CSV 时,Pandas 中的“EmptyDataError:没有要从文件中解析的列” - 'EmptyDataError: No columns to parse from file' in Pandas when concatenating all files in a directory into single CSV EmptyDataError:在字典中加载多个文件时没有要从文件中解析的列 - EmptyDataError: No columns to parse from file when loading several files in a dictionary 使用 boto3 从 S3 存储桶中读取多个 csv 文件 - Reading multiple csv files from S3 bucket with boto3 EmptyDataError:没有要从文件中解析的列 - EmptyDataError: No columns to parse from file EmptyDataError:没有要从文件中解析的列 - EmptyDataError : No columns to parse from file 没有要从文件中解析的列 (EmptyDataError: ) - No columns to parse from file (EmptyDataError: ) 无法读取.csv 文件。 EmptyDataError:没有要从文件中解析的列 - Cant read .csv file. EmptyDataError: No columns to parse from file 已解决 - Python Pandas 不导入 .csv。 错误:pandas.errors.EmptyDataError:没有要从文件中解析的列 - SOLVED - Python Pandas not importing .csv. Error: pandas.errors.EmptyDataError: No columns to parse from file 动态URL上的pandas read_csv给出EmptyDataError:没有要从文件解析的列 - pandas read_csv on dynamic URL gives EmptyDataError: No columns to parse from file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM