简体   繁体   English

如何从s3存储桶中仅读取5条记录并在不获取csv文件的所有数据的情况下返回它

[英]How to read only 5 records from s3 bucket and return it without getting all data of csv file

Hello guys I know lots of similar questions i'll find here but i have a code which is executing properly which is returning five records also my query is how should i only read the entire file and atlast return with desire rows just supose i have csv file which have size in gb so i don't want to return the entire gb file data for getting only 5 records so please tell me how should i get it....Please if possible explain my code if it is not good why it is not good.. code:大家好,我知道很多类似的问题,我会在这里找到,但我有一个正确执行的代码,它返回五个记录,我的查询是我应该如何只读取整个文件并最终返回所需的行,只是假设我有 csv文件的大小以 gb 为单位,所以我不想返回整个 gb 文件数据只获取 5 条记录,所以请告诉我应该如何获取它....如果可能的话,请解释我的代码,如果它不好,为什么它不好..代码:

import boto3
from botocore.client import Config
import pandas as pd

ACCESS_KEY_ID = 'something'
ACCESS_SECRET_KEY = 'something'
BUCKET_NAME = 'something'
Filename='dataRepository/source/MergedSeedData(Parts_skills_Durations).csv'

client = boto3.client("s3",
                     aws_access_key_id=ACCESS_KEY_ID,
                     aws_secret_access_key=ACCESS_SECRET_KEY)
obj = client.get_object(Bucket=BUCKET_NAME, Key=Filename)
Data = pd.read_csv(obj['Body'])
# data1 = Data.columns
# return data1
Data=Data.head(5)
print(Data)

This my code which is running fine also getting the 5 records from s3 bucket but i have explained it what i'm looking for any other query feel free to text me...thnxx in advance这是我运行良好的代码,也从 s3 存储桶中获取了 5 条记录,但我已经解释了我正在寻找的任何其他查询,请随时给我发短信...thnxx

You can use the pandas capability of reading a file in chunks , just loading as much data as you need.您可以使用 Pandas 以块形式读取文件的功能,只需根据需要加载尽可能多的数据。

data_iter = pd.read_csv(obj['Body'], chunksize = 5)
data = data_iter.get_chunk()
print(data)

You can use a HTTP Range: header ( see RFC 2616 ), which take a byte range argument.您可以使用 HTTP Range:标头( 请参阅 RFC 2616 ),它采用字节范围参数。 S3 APIs have a provision for this and this will help you to NOT read/download the whole S3 file. S3 API 对此有一个规定,这将帮助您不要读取/下载整个 S3 文件。

Sample code:示例代码:

import boto3
obj = boto3.resource('s3').Object('bucket101', 'my.csv')
record_stream = obj.get(Range='bytes=0-1000')['Body']
print(record_stream.read())

This will return only the byte_range_data provided in the header.这将仅返回标头中提供的 byte_range_data。

But you will need to modify this to convert the string into Dataframe .但是您需要修改它以将字符串转换为Dataframe Maybe read + join for the \\t and \\n present in the string coming from the .csv file也许read + join出现在来自.csv文件的字符串中的\\t\\n

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 如何读取从 S3 存储桶检索的 CSV 文件? - Python - How to read CSV file retrieved from S3 bucket? 如何从 AWS Lambda 中的 s3 存储桶读取 csv 文件? - How to read csv file from s3 bucket in AWS Lambda? AWS Lambda:使用Python从s3存储桶中读取csv文件尺寸,而无需使用Pandas或CSV包 - AWS Lambda: read csv file dimensions from an s3 bucket with Python without using Pandas or CSV package 无法从s3存储桶读取大型csv文件到python - unable to read large csv file from s3 bucket to python 从 S3 存储桶中的 CSV 文件中读取数据并将其存储在 python 的字典中 - Read data from a CSV file in S3 bucket and store it in a dictionary in python 仅从 S3 存储桶获取文件名而不下载文件 - Getting only filenames from S3 bucket without downloading files 如何将 memory 值中的字典数据直接写入 csv 文件中的 s3 存储桶而不写入文件然后上传 - How to write dictionary data in memory value directly to s3 bucket as in csv file without writing in file then uploading it 如何从 S3 存储桶中读取 CSV 文件,对其应用某些 if 语句,并编写新的更新的 CSV 文件并将其放入 S3 存储桶? - How can I read from a CSV file from an S3 bucket, apply certain if-statements to it, and write a new updated CSV file and place it in the S3 bucket? 如何使用 Python 中的 Pandas 从 s3 存储桶中读取 csv 文件 - How to read a csv file from an s3 bucket using Pandas in Python 从存储在 s3 存储桶中的 xml 文件读取和获取数据 - read and get data from xml file stored in s3 bucket
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM