简体   繁体   中英

Is it possible to get the contents of an S3 file without downloading it using boto3?

I am working on a process to dump files from a Redshift database, and would prefer not to have to locally download the files to process the data. I saw that Java has a StreamingObject class that does what I want, but I haven't seen anything similar in boto3 .

If you have a mybucket S3 bucket, which contains a beer key, here is how to download and fetch the value without storing it in a local file:

import boto3
s3 = boto3.resource('s3')
print s3.Object('mybucket', 'beer').get()['Body'].read()

This may or may not be relevant to what you want to do, but for my situation one thing that worked well was using tempfile:

import tempfile
import boto3
import PyPDF2

bucket_name = 'my_bucket'
s3 = boto3.resource('s3')
temp = tempfile.NamedTemporaryFile()
s3.Bucket(bucket_name).download_file(key_name, temp.name)
pdfFileObj = open(temp.name,'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
[... do what you will with your file ...]
temp.close()

smart_open is a Python 3 library for efficient streaming of very large files from/to storages such as S3 , GCS, Azure Blob Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.

https://pypi.org/project/smart-open/

import boto3
import smart_open

client = boto3.client(service_name='s3',
                      aws_access_key_id=AWS_ACCESS_KEY_ID,
                      aws_secret_access_key=AWS_SECRET_KEY,
                      )
url = 's3://.............'
fin = smart_open.open(url, 'r', transport_params={'client':client})

for line in fin:
    data = json.loads(line)
    
    print(data)
    
fin.close()

I use that solution, actually:

import boto3

s3_client = boto3.client('s3')

def get_content_from_s3(bucket: str, key: str) -> str:
  """Save s3 content locally
     param: bucket, s3 bucket
     param: key, path to the file, f.i. folder/subfolder/file.txt
  """
  s3_file = s3_client.get_ojct(Bucket=bucket, Key=key)['Body'].read()
  return s3_file.decode('utf-8').strip()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM