简体   繁体   English

如何从Google Cloud Storage Python远程读取公共文件?

[英]How can I read public files from google cloud storage python remotely?

I need to reed some CSV files that were shared which are in google Cloud Storage. 我需要保存Google Cloud Storage中共享的一些CSV文件。 My Script will run from another server outside from Google Cloud. 我的脚本将从Google Cloud外部的其他服务器上运行。

I am using this code: 我正在使用此代码:

from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('/stats/installs')
blob = storage.Blob('installs_overview.csv', bucket)  
content = blob.download_as_string()

print(content)

Getting this error: Apparently I haven't specified the project but I don't have one 收到此错误:显然我没有指定项目,但是我没有一个

OSError: Project was not passed and could not be determined from the environment.

There are some wrong assumptions in the previous answers in this topic. 在本主题的先前答案中存在一些错误的假设。

If it is a public bucket you do not have to worry about what project it is connected to. 如果是公共存储桶,则不必担心它连接到哪个项目。 It is well documented how you, for example, can use a bucket to host a public website that browsers can access. 有据可查的文件,例如,您如何使用存储桶来托管浏览器可以访问的公共网站。 Obviously the browser does not have to worry about what project it belongs to. 显然,浏览器不必担心它属于哪个项目。

The code samples are a bit lacking on using public buckets & files,-- in all the examples you supply a project and credentials, which will 代码示例在使用公共存储桶和文件方面有些欠缺,在所有示例中,您都提供了项目和凭据,

1) Bill bucket egress on the project you supply instead of the project the bucket is connected to 1)在您提供的项目上,而不是在桶所连接的项目上,进行比值桶出口

2) Assumes that you need to authenticate and authorise. 2)假设您需要进行身份验证和授权。

For a public file or bucket, however, all you have to worry about is the bucket name and file location. 但是,对于公共文件或存储桶,您只需担心存储桶名称和文件位置。

You can 您可以

from google.cloud import storage
source="path/to/file/in/bucket.txt"
target="/your/local/file.txt"
client = storage.Client.create_anonymous_client()
# you need to set user_project to None for anonymous access
# If not it will attempt to put egress bill on the project you specify,
# and then you need to be authenticated to that project.
bucket = client.bucket(bucket_name="your-bucket", user_project=None)
blob = storage.Blob(source, bucket)
blob.download_to_filename(filename=target, client=client)

It is important that your file in the bucket has read access to "AllUsers" 您的存储桶中的文件具有对“ AllUsers”的读取权限,这一点很重要

First of all, I think there might be some confusion regarding Cloud Storage and how to access it. 首先,我认为关于云存储及其访问方式可能会有些混乱。 Cloud Storage is a Google Cloud Platform product, and therefore, to use it, a GCP Project must exist. Cloud Storage是Google Cloud Platform产品,因此,要使用它,必须存在一个GCP项目 You can find the project number and project ID for your project in the Home page of the Console, as explained in this documentation page . 您可以在控制台的主页中找到该项目的项目编号和项目ID,如本文档页面所述

That being said, let me refer you to the documentation page about the Python Cloud Storage Client Library . 话虽这么说,让我转介给您有关Python Cloud Storage Client Library的文档页面。 When you create the client to use the service, you can optionally specify the project ID and/or the credentials files to use: 创建client以使用服务时,可以选择指定要使用的项目ID和/或凭据文件:

client = storage.Client(project="PROJECT_ID",credentials="OAUTH2_CREDS")

If you do not specify the Project ID, it will be inferred from the environment. 如果您未指定Project ID,则会从环境中推断出该ID。

Also, take into account that you must set up authentication in order to use the service. 此外,请考虑到必须设置身份验证才能使用该服务。 If you were running the application inside another GCP service (Compute Engine, App Engine, etc.), the recommended approach would be using the Application Default Credentials . 如果您在另一个GCP服务(例如Compute Engine,App Engine等)中运行应用程序,则建议的方法是使用Application Default Credentials However, given that that is not your case, you should instead follow this guide to set up authentication , downloading the key for the Service Account having permission to work with Cloud Storage and pointing to it in the environment variable GOOGLE_APPLICATION_CREDENTIALS . 但是,鉴于情况并非如此,您应该改用本指南来设置身份验证 ,下载具有使用Cloud Storage权限的服务帐户的密钥,并在环境变量GOOGLE_APPLICATION_CREDENTIALS指向它。

Also, it looks like the configuration in your code is not correct, given that the bucket name you are using ( '/stats/installs' ) is not valid: 同样,由于您使用的存储桶名称( '/stats/installs' )无效,因此代码中的配置似乎不正确:

Bucket names must be between 3 and 63 characters. 值区名称必须介于3到63个字符之间。 A bucket name can contain lowercase alphanumeric characters, hyphens, and underscores . 值区名称可以包含小写字母数字字符,连字符和下划线 It can contain dots (.) if it forms a valid domain name with a top-level domain (such as .com). 如果它与顶级域(例如.com)形成有效的域名,则可以包含点(。)。 Bucket names must start and end with an alphanumeric character . 值区名称必须以字母数字字符开头和结尾

Note that you can see that the given bucket does not exist by working with exceptions , specifically google.cloud.exceptions.NotFound . 请注意,通过使用异常 ,特别是google.cloud.exceptions.NotFound ,可以看到给定的存储桶不存在。 Also, given that the files you are trying to access are public, I would not recommend to share the bucket and file names, you can just obfuscate them with a code such as <BUCKET_NAME>, <FILE_NAME> . 另外,鉴于您要访问的文件是公共文件,因此我不建议您共享存储桶和文件名,您可以使用<BUCKET_NAME>, <FILE_NAME>类的代码对其进行混淆。

So, as a summary, the course of action should be: 因此,作为总结,采取的行动应该是:

  1. Identify the project to which the bucket you want to work with belongs. 确定您要使用的存储桶所属的项目。
  2. Obtain the right credentials to work with GCS in that project. 获取正确的凭据以在该项目中使用GCS。
  3. Add the project and credentials to the code. 将项目和凭据添加到代码中。
  4. Fix the code you shared with the correct bucket and file name. 使用正确的存储桶和文件名修复共享的代码。 Note that if the file is inside a folder (even though in GCS the concept of directories itself does not exist, as I explained in this other question ), the file name in storage.Blob() should include the complete path like path/to/file/file.csv . 请注意,如果文件位于文件夹内(即使在GCS中目录本身的概念不存在,正如我在另一个问题中所解释的那样),则storage.Blob()的文件名应包含完整路径,例如path/to/file/file.csv

I am not a google-cloud expert, but as some of the commentators have said, I think the problem will be that you haven't explicitly told the storage client which Project you are talking about. 我不是Google云专家,但是正如一些评论员所说,我认为问题在于您没有明确告诉存储客户端您正在谈论哪个Project。 The error message implies that the storage client tries to figure out for itself which project you are referring to, and if it can't figure it out, it gives that error message. 该错误消息表示存储客户端会尝试自己找出您要引用的项目,如果找不到,则会显示该错误消息。 When I use the storage Client I normally just provide the project name as an argument and it seems to do the trick, eg: 当我使用存储客户端时,通常只提供项目名称作为参数,这似乎可以解决问题,例如:

client = storage.Client(project='my-uber-project')

Also, I just saw your comment that your bucket "doesn't have a project" - I don't understand how this is possible. 另外,我刚刚看到您的评论,说您的存储桶“没有项目”-我不知道这怎么可能。 If you log in to the google cloud console area and go to storage, surely your bucket is listed there and you can see your project name at the top of the page? 如果您登录到Google Cloud Console区域并转到存储区,那么肯定会在其中列出您的存储桶,并且您可以在页面顶部看到您的项目名称吗?

As @Mangu said, the bucket name in your code is presumably just to hide the real bucket name, as forward-slashes are not allowed in bucket names (but are allowed in blob names and can be used to represent 'folders'). 正如@Mangu所说,代码中的存储桶名称大概只是为了隐藏真实的存储桶名称,因为存储桶名称中不允许使用正斜杠(但在Blob名称中允许使用正斜杠,并且可以用来表示“文件夹”)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Python远程读取和写入HDFS? - How can I read from and write to HDFS remotely using Python? 如何使用 pip 下载 python package 到具有公共访问权限的谷歌云存储桶中并从那里安装 - How to download a python package using pip into google Cloud Storage bucket with public access and to install from there 努力从 Google Cloud Storage 存储桶中读取 csv 文件 - Struggling to read csv files from Google Cloud Storage bucket 如何从谷歌云存储桶中读取 python 代码中的.json 文件 - How to Read .json file in python code from google cloud storage bucket 如何使用 Google Cloud Function 将文件从 Cloud Storage 存储桶推送到实例中? - How can I use a Google Cloud Function to push a file from a Cloud Storage bucket into an instance? 使用 python 获取某个文件后,如何从 Google 云存储桶中获取文件? - How do you fetch files from Google cloud storage bucket after a certain file is fetched using python? 使用 Python 脚本中的 Google Cloud Functions 从 Google Cloud Storage 读取 CSV - Read a CSV from Google Cloud Storage using Google Cloud Functions in Python script Google Datalab从云存储中读取 - Google Datalab read from cloud storage 如何在特定文件集从谷歌云到达云存储时启动云数据流管道 function - how to launch a cloud dataflow pipeline when particular set of files reaches Cloud storage from a google cloud function 如何从 Colab Notebook 中提取 Google Cloud Storage 存储桶中的 tar.gz 文件? - How can I extract a tar.gz file in a Google Cloud Storage bucket from a Colab Notebook?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM