简体   繁体   中英

How to access csv file from Google Cloud Storage in a Google Cloud Function via Pandas?

I'm new to cloud functions, so I followed the default GCP cloud function "hello world" tutorial . It worked fine and printed "hello world" as expected. I only changed the requirements.txt file to include pandas and google-cloud-storage. Likewise, all my edits to the main.py script were in the imports section before the function definition AND in the else section of the function.

requirements.txt

pandas 
google-cloud-storage

main.py:

import pandas as pd
from google.cloud import storage   

def hello_world(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
    else:       
        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        blob.download_to_filename('temp.csv')        
        with open('temp.csv','rb') as f:
            df = pd.read_csv(f)
        
        return str(df.columns)

When I test the function in GCP's "test cloud function" area, the following errors are captured in the logs. The first 7 lines seem to boilerplate errors while the last two are specific to my actual program. File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv' . I have no idea why this error is triggering.

Errors:

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 87, in view_func return function(request._get_current_object()) 
File "/workspace/main.py", line 25, in hello_world blob.download_to_filename('temp.csv') 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'

For context, I've already added credentials to the appropriate service account, which this cloud function uses as per the configurations I set up. So, authorization aside, I have no idea why the function failing. What should I change?

For context, I'm simply trying to open an arbitrary csv file from cloud storage in pandas and return the names of the columns as a string. This has no practical value, just a functional test before building something of value.

Edit1: The specific IAM role given to the service account corresponding to the cloud function in question is 'roles/editor' which should be sufficient, as far as I can know.

Edit2: It appears that GCP cloud functions operate in a read only environment . So there must be some other way to open the file, without using the blob.download_to_filename command.

You are new on Cloud Functions and there are some stuff to know and some trap to avoid. One of them: Cloud Functions is stateless, you can't write on the file system.

Except on the /tmp directory. It's a in memory file system (size correctly your Cloud Functions memory size to take into account your app memory footprint + the file size stored in the /tmp dir)

Update your Cloud Function like that

....
    else:       
        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        blob.download_to_filename('/tmp/temp.csv')        
        with open('/tmp/temp.csv','rb') as f:
            df = pd.read_csv(f)
        
        return str(df.columns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM