简体   繁体   English

Pandas:如何访问内部 netapp 存储网格文件

[英]Pandas: How to access in house netapp storage grid file

I have NetApp storage grid(S3) in company infrastructure.我在公司基础架构中有 NetApp 存储网格 (S3)。 I am new to S3.我是 S3 的新手。 After processing a csv file in Pandas, I need to write this file to S3.在 Pandas 处理完一个 csv 文件后,我需要将此文件写入 S3。 The URL for the Storage grid is https://myCompanys3.storage.net and the bucket is 'test_bucket'.存储网格的 URL 为https://myCompanys3.storage.net ,存储桶为“test_bucket”。 I referred to https://stackoverflow.com/a/51777553/13065899我提到了https://stackoverflow.com/a/51777553/13065899

Followed these steps based on other reading on Python/Pandas/S3:根据 Python/Pandas/S3 上的其他阅读,按照以下步骤操作:

  1. Created folder.aws in my users folder (windows laptop)在我的用户文件夹(Windows 笔记本电脑)中创建了 folder.aws
  2. Created credentials file with these entries:使用以下条目创建凭据文件:

''' '''

[default]
aws_access_key_id=myAccessKey
aws_secret_access_key=mySecretAccessKey

''' '''

  1. pip install s3fs pip 安装 s3fs
  2. Wrote this line of code:写了这行代码:

df.to_csv('https://myCompanys3.storage.net/test_bucket/myTest.csv')

Got this error: urllib.error.HTTPError: HTTP Error 403: Forbidden Is the path given in to_csv above the correct way to construct the full path the file?得到这个错误: urllib.error.HTTPError: HTTP 错误 403: Forbidden 上面 to_csv 中给出的路径是构造文件完整路径的正确方法吗?

All examples I have seen so far start with 's3://' and not a full url.到目前为止,我看到的所有示例都以“s3://”开头,而不是完整的 url。

Is s3 a key word and needed for any read/write to storage grid? s3 是一个关键字并且需要任何读/写存储网格吗?

Tried试过了

df.to_csv('s3://https://s3.medcity.net://hpg-dl-dev/PandasInvoiceTest.csv', index=False)

Got this error: Invalid bucket name "https:": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$"收到此错误:无效的存储桶名称“https:”:存储桶名称必须匹配正则表达式“^[a-zA-Z0-9.-_]{1,255}$”

Can someone help me with what I am missing?有人可以帮我解决我所缺少的吗? Perhaps a s3 configuration where I externalize the url?也许是我将 url 外部化的 s3 配置?

Thank you in advance.先感谢您。

  1. Use boto3 to establish your connection and download the file使用boto3建立连接并下载文件
  2. stream the string object into pd.read_csv() using io.StringIO() stream 使用 io.StringIO() 将字符串 object 转换为pd.read_csv() io.StringIO()
import boto3, json
from pathlib import Path
import io

with open(Path.cwd().joinpath("aws-secrets.json")) as f: cfg = json.load(f)
sess = boto3.session.Session(region_name=cfg["REGION_NAME"],
                                 aws_access_key_id=cfg["ACCESS_ID"],
                                 aws_secret_access_key=cfg["ACCESS_KEY"])

pd.read_csv(io.StringIO(
    sess.resource("s3").Object("silicon-myfiles", "elevationdata.csv").get()["Body"].read().decode()
))


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过 Pandas 从 Google Cloud Function 中的 Google Cloud Storage 访问 csv 文件? - How to access csv file from Google Cloud Storage in a Google Cloud Function via Pandas? 如何将 pandas 数据添加到 Google Cloud Storage 中现有的 csv 文件中? - How to add pandas data to an existing csv file in Google Cloud Storage? 将熊猫地址拆分为街道和门牌号码 - Splitting pandas address into street and house numbers 如何在python中使用熊猫访问导入的csv文件中的元素? - How to access elements from imported csv file with pandas in python? 如何使用 python Pandas 访问 csv 文件的倒数第二行? - How to access the second to last row of a csv file using python Pandas? 我如何将 csv 文件读入 pandas dataframe 中,该文件存储在我的 jupyter 实验室的本地存储中 - How do i read a csv file into a pandas dataframe which is stored in my local storage in jupyter lab 如何访问熊猫中的密钥 - How to access keys in pandas 如何访问 pandas 中的一行? - How to access a row in pandas? 如何使用 python 和 pandas read_fwf ZC1C42542074E68384F5D1 处理位于 Azure blob 存储中的文件 - How to process a file located in Azure blob Storage using python with pandas read_fwf function 如何从 Pandas HDF 存储中读取 nrows? - How to Read nrows From Pandas HDF Storage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM