I have s3 bucket url is below
s3_filename is s3://xx/xx/y/z/ion.csv
if its is bucket i can read like below code
def read_s3(bucket, key):
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=key)
df = pd.read_csv(obj['Body'])
return df
Since you appear to be using Pandas, please note that it actually uses s3fs
under the cover. So, if your install is relatively recent and standard, you may directly do:
df = pd.read_csv(s3_path)
If you have some specific config for your bucket, for example special credentials, KMS encryption, etc., you may use an explicitly configured s3fs
filesystem, for example:
fs = s3fs.S3FileSystem(
key=my_aws_access_key_id,
secret=my_aws_secret_access_key,
s3_additional_kwargs={
'ServerSideEncryption': 'aws:kms',
'SSEKMSKeyId': my_kms_key,
},
)
# note: KMS encryption only used when writing; when reading, it is automatic if you have access
with fs.open(s3_path, 'r') as f:
df = pd.read_csv(f)
# here we write the same df at a different location, making sure
# it is using my_kms_key:
with fs.open(out_s3_path, 'w') as f:
df.to_csv(f)
That said, if you are really interested to deal yourself with getting the object, and the question is just about how to remove a potential s3://
prefix and then split bucket/key
, you could simply use:
bucket, key = re.sub(r'^s3://', '', s3_path).split('/', 1)
But that may miss more general cases and conventions handled by systems such as awscli or the very s3fs referenced above.
For more generality, you can take a look at how they do this in awscli
. In general, doing so often provides a good indication of whether or not some functionality may already be built in boto3
or botocore
. In this case however, it would appear not (looking at a local clone of release-1.18.126). They simply do this from first principles: see awscli.customizations.s3.utils.split_s3_bucket_key
as it is implemented here .
From the regex that is eventually used in that code, you can infer that the kind of cases awscli
allows for s3_path
is quite diverse indeed:
_S3_ACCESSPOINT_TO_BUCKET_KEY_REGEX = re.compile(
r'^(?P<bucket>arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[:/][^/]+)/?'
r'(?P<key>.*)$'
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.