简体   繁体   中英

How to access csv file from Google Cloud Storage in a Google Cloud Function via Pandas?

I'm new to cloud functions, so I followed the default GCP cloud function "hello world" tutorial . It worked fine and printed "hello world" as expected. I only changed the requirements.txt file to include pandas and google-cloud-storage. Likewise, all my edits to the main.py script were in the imports section before the function definition AND in the else section of the function.




import pandas as pd
from google.cloud import storage   

def hello_world(request):
    """Responds to any HTTP request.
        request (flask.Request): HTTP request object.
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        with open('temp.csv','rb') as f:
            df = pd.read_csv(f)
        return str(df.columns)

When I test the function in GCP's "test cloud function" area, the following errors are captured in the logs. The first 7 lines seem to boilerplate errors while the last two are specific to my actual program. File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv' . I have no idea why this error is triggering.


Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 87, in view_func return function(request._get_current_object()) 
File "/workspace/main.py", line 25, in hello_world blob.download_to_filename('temp.csv') 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'

For context, I've already added credentials to the appropriate service account, which this cloud function uses as per the configurations I set up. So, authorization aside, I have no idea why the function failing. What should I change?

For context, I'm simply trying to open an arbitrary csv file from cloud storage in pandas and return the names of the columns as a string. This has no practical value, just a functional test before building something of value.

Edit1: The specific IAM role given to the service account corresponding to the cloud function in question is 'roles/editor' which should be sufficient, as far as I can know.

Edit2: It appears that GCP cloud functions operate in a read only environment . So there must be some other way to open the file, without using the blob.download_to_filename command.

You are new on Cloud Functions and there are some stuff to know and some trap to avoid. One of them: Cloud Functions is stateless, you can't write on the file system.

Except on the /tmp directory. It's a in memory file system (size correctly your Cloud Functions memory size to take into account your app memory footprint + the file size stored in the /tmp dir)

Update your Cloud Function like that

        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        with open('/tmp/temp.csv','rb') as f:
            df = pd.read_csv(f)
        return str(df.columns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM