How can I access s3 files in Python using urls?

Question

I want to write a Python script that will read and write files from s3 using their url's, eg:'s3:/mybucket/file'. It would need to run locally and in the cloud without any code changes. Is there a way to do this?

Edit: There are some good suggestions here but what I really want is something that allows me to do this:

 myfile = open("s3://mybucket/file", "r")

and then use that file object like any other file object. That would be really cool. I might just write something like this for myself if it doesn't exist. I could build that abstraction layer on simples3 or boto.

Answer 1

For opening, it should be as simple as:

import urllib
opener = urllib.URLopener()
myurl = "https://s3.amazonaws.com/skyl/fake.xyz"
myfile = opener.open(myurl)

This will work with s3 if the file is public.

To write a file using boto, it goes a little something like this:

from boto.s3.connection import S3Connection
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
destination = bucket.new_key()
destination.name = filename
destination.set_contents_from_file(myfile)
destination.make_public()

lemme know if this works for you :)

Answer 2

Here's how they do it in awscli :

def find_bucket_key(s3_path):
    """
    This is a helper function that given an s3 path such that the path is of
    the form: bucket/key
    It will return the bucket and the key represented by the s3 path
    """
    s3_components = s3_path.split('/')
    bucket = s3_components[0]
    s3_key = ""
    if len(s3_components) > 1:
        s3_key = '/'.join(s3_components[1:])
    return bucket, s3_key


def split_s3_bucket_key(s3_path):
    """Split s3 path into bucket and key prefix.
    This will also handle the s3:// prefix.
    :return: Tuple of ('bucketname', 'keyname')
    """
    if s3_path.startswith('s3://'):
        s3_path = s3_path[5:]
    return find_bucket_key(s3_path)

Which you could just use with code like this

from awscli.customizations.s3.utils import split_s3_bucket_key
import boto3
client = boto3.client('s3')
bucket_name, key_name = split_s3_bucket_key(
    's3://example-bucket-name/path/to/example.txt')
response = client.get_object(Bucket=bucket_name, Key=key_name)

This doesn't address the goal of interacting with an s3 key as a file like object but it's a step in that direction.

Answer 3

I haven't seen something that would work directly with S3 urls, but you could use an S3 access library ( simples3 looks decent) and some simple string manipulation:

>>> url = "s3:/bucket/path/"
>>> _, path = url.split(":", 1)
>>> path = path.lstrip("/")
>>> bucket, path = path.split("/", 1)
>>> print bucket
'bucket'
>>> print path
'path/'

Answer 4

Try s3fs

First example on the docs:

>>> import s3fs
>>> fs = s3fs.S3FileSystem(anon=True)
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
...     print(f.read())
b'Hello, world'

Answer 5

You can use Boto Python API for accessing S3 by python. Its a good library. After you do the installation of Boto, following sample programe will work for you

>>> k = Key(b)
>>> k.key = 'yourfile'
>>> k.set_contents_from_filename('yourfile.txt')

You can find more information here http://boto.cloudhackers.com/s3_tut.html#storing-data

Answer 6

http://s3tools.org/s3cmd works pretty well and support the s3:// form of the URL structure you want. It does the business on Linux and Windows. If you need a native API to call from within a python program then http://code.google.com/p/boto/ is a better choice.

How can I access s3 files in Python using urls?

Question

6 answers

solution1
15 2011-02-15 23:25:17

solution2
10 2017-06-01 20:53:38

solution3
3 2011-02-14 15:11:50

solution4
2 2019-08-12 19:20:15

solution5
1 2011-02-14 15:27:03

solution6
1 2011-02-14 17:17:03

How can I access s3 files in Python using urls?

Question

6 answers

solution1 15 2011-02-15 23:25:17

solution2 10 2017-06-01 20:53:38

solution3 3 2011-02-14 15:11:50

solution4 2 2019-08-12 19:20:15

solution5 1 2011-02-14 15:27:03

solution6 1 2011-02-14 17:17:03

solution1
15 2011-02-15 23:25:17

solution2
10 2017-06-01 20:53:38

solution3
3 2011-02-14 15:11:50

solution4
2 2019-08-12 19:20:15

solution5
1 2011-02-14 15:27:03

solution6
1 2011-02-14 17:17:03