简体   繁体   中英

I'm having trouble figuring out how to automate OAuth authentication to access Google Drive. (Python)

What we want to solve

  1. access the shared drive with a user account by OAuth authentication
  2. retrieve spreadsheet -> convert to parquet type 3. save to GCS
  3. save to GCS

These processes are written in the main() function below, and I would like to apply them to periodic processing every day using CloudFunction and CloudScheduler.

However, as it is, the code below requires the user to manually log in to his/her Google account by going to the browser. I would like to rewrite the code so that this login can be done automatically, but I am having trouble understanding it... I would appreciate it if someone could help me...

Translated with www.DeepL.com/Translator (free version)

 ### ※※Authentication is required by browser※※
creds = flow.run_local_server(port=0)
### Result
Please visit this URL to authorize this application: 
https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=132987612861-
4j24afrouontpeiv5ryy7sn64inhr.apps.googleusercontent.com&redirect_uri=
http%yyy%2Flocalhost%3yy6%2F&scope=httpsyyF%2Fwww.googleapis.com%2Fauth%2Fdrive.
readonly&state=XXXXXXXXXXXXXXXXXXXXXXXXXXX&access_type=offline

The readonly&state=XXXXXXXXXXXXXXXXXXXXX part changes with each execution.

Browser screen that transitions when the above code section is executed

The entire relevant source code

from __future__ import print_function
import io
import os
import key
import json
import os.path
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
from pprint import pprint
from webbrowser import Konqueror
from google.cloud import storage as gcs
from google.oauth2 import service_account
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.http import MediaIoBaseDownload, MediaIoBaseUpload, MediaFileUpload
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

SCOPES = ['https://www.googleapis.com/auth/drive.readonly']

def main(event, context):
    """Drive v3 API
    Function to access shared Drive→get Spreadsheet→convert to parquet→upload to GCS    """
    creds = None
    file_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx' #Unedited data in shared drive
    mime_type = 'text/csv'

    # OAuth authentication to access shared drives
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
     # Allow users to log in if there are no (valid) credentials available    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            ### ※※Browser authentication required※※
            creds = flow.run_local_server(port=0)##Currently, we need a manual login here!
        with open('token.json', 'w') as token:
            token.write(creds.to_json())
    try:
        # Retrieve spreadsheets from shared drives
        service = build('drive', 'v3', credentials=creds)
        request = service.files().export_media(fileId=file_id, mimeType=mime_type)
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        print(io.StringIO(fh.getvalue().decode()))

        while done is False:
            status, done = downloader.next_chunk()
        # Read "Shared Drive/SpreadSheet" -> convert to parquet
        df = pd.read_csv(io.StringIO(fh.getvalue().decode()))
        table = pa.Table.from_pandas(df)
        buf = pa.BufferOutputStream()
        pq.write_table(table, buf,compression=None)

        # service_account for save to GCS
        key_path = 'service_account_file.json'
        service_account_info = json.load(open(key_path))
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
        client = gcs.Client(
            credentials=credentials,
            project=credentials.project_id,
        )

        # GCS information to be saved 
        bucket_name = 'bucket-name'
        blob_name = 'sample-folder/daily-data.parquet'#save_path
        bucket = client.get_bucket(bucket_name)
        blob = bucket.blob(blob_name)

        # parquet save to GCS
        blob.upload_from_string(data=buf.getvalue().to_pybytes())
        # ↓If a print appears, the data has been saved.
        print("Blob '{}' created to '{}'!".format(blob_name, bucket_name))

    except HttpError as error:
        # TODO(developer) - Handle errors from drive API.
        print(f'An error occurred: {error}')

What I tried by myself

I tried to use selenium to run the browser, but could not implement it well because the browser login URL is different each time. ←I may be able to find a way to do it.

Try this approach. Worked for me!

The solution consists in create a service account and share your data folder with the SA e-mail.

Drive API Service account

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM