Pandas: How to access in house netapp storage grid file

Question

I have NetApp storage grid(S3) in company infrastructure. I am new to S3. After processing a csv file in Pandas, I need to write this file to S3. The URL for the Storage grid is https://myCompanys3.storage.net and the bucket is 'test_bucket'. I referred to https://stackoverflow.com/a/51777553/13065899

Followed these steps based on other reading on Python/Pandas/S3:

Created folder.aws in my users folder (windows laptop)
Created credentials file with these entries:

'''

[default]
aws_access_key_id=myAccessKey
aws_secret_access_key=mySecretAccessKey

'''

pip install s3fs
Wrote this line of code:

df.to_csv('https://myCompanys3.storage.net/test_bucket/myTest.csv')

Got this error: urllib.error.HTTPError: HTTP Error 403: Forbidden Is the path given in to_csv above the correct way to construct the full path the file?

All examples I have seen so far start with 's3://' and not a full url.

Is s3 a key word and needed for any read/write to storage grid?

Tried

df.to_csv('s3://https://s3.medcity.net://hpg-dl-dev/PandasInvoiceTest.csv', index=False)

Got this error: Invalid bucket name "https:": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$"

Can someone help me with what I am missing? Perhaps a s3 configuration where I externalize the url?

Thank you in advance.

Answer 1

Use boto3 to establish your connection and download the file
stream the string object into pd.read_csv() using io.StringIO()

import boto3, json
from pathlib import Path
import io

with open(Path.cwd().joinpath("aws-secrets.json")) as f: cfg = json.load(f)
sess = boto3.session.Session(region_name=cfg["REGION_NAME"],
                                 aws_access_key_id=cfg["ACCESS_ID"],
                                 aws_secret_access_key=cfg["ACCESS_KEY"])

pd.read_csv(io.StringIO(
    sess.resource("s3").Object("silicon-myfiles", "elevationdata.csv").get()["Body"].read().decode()
))

Pandas: How to access in house netapp storage grid file

Question

1 answers

solution1
0 2020-08-13 06:17:56

Pandas: How to access in house netapp storage grid file

Question

1 answers

solution1 0 2020-08-13 06:17:56

solution1
0 2020-08-13 06:17:56