简体   繁体   中英

Convert a Python Script into Google Cloud Function: Exporting Search Console data into Big Query

Trying to find a way to use this script into a cloud function which keeps throwing me errors. I need help finding out what is going wrong, how to fix it and get it to work as a cloud function.

Here's the full script, I edited out some details like the url and client secret names but these are not what are driving the error.

    from tracemalloc import start
    from google.oauth2 import service_account
    from googleapiclient.discovery import build
    import requests
    import json
    import pandas as pd
    from google.cloud import bigquery
    from datetime import date, timedelta, datetime



    PROPERTIES = ["https://example.com"]
    BQ_DATASET_NAME = 'gsc_2022'
    BQ_TABLE_NAME = 'pipeline-3'
    CRED_PATH = "example_credential_path.json"
    LOCATION = "us-central1"
    start_date = (datetime.now()-timedelta(days=2)).strftime("%Y-%m-%d")
    end_date = (datetime.now()-timedelta(days=2)).strftime("%Y-%m-%d")
    start_row = 0
    version = 'v1'

    SCOPES = ['https://www.googleapis.com/auth/webmasters']
    credentials = service_account.Credentials.from_service_account_file(
            CRED_PATH, scopes=SCOPES)

    def get_sc_df(site_url,start_date,end_date,start_row):
        #Grab Search Console data for the specific property and send it to BigQuery
        service = build('webmasters', 'v3', credentials=credentials)
        request = {
          'startDate': start_date,
          'endDate': end_date,
          'dimensions': ['date','query', 'page', 'device','country'], # uneditable to enforce a nice clean dataframe at the end!
          'rowLimit': 25000,
          'startRow': start_row
           }

        response = service.searchanalytics().query(siteUrl=site_url, body=request).execute()

        if len(response) > 1:

            x = response['rows']

            df = pd.DataFrame.from_dict(x)
            
            # split the keys list into columns
            df[['query','device', 'page', 'date','country']] = pd.DataFrame(df['keys'].values.tolist(), index= df.index)
            
            # Drop the key columns
            result = df.drop(['keys'],axis=1)

            # Add a website identifier
            #result['website'] = site_url

            # establish a BigQuery client
            client = bigquery.Client.from_service_account_json(CRED_PATH)
            dataset_id = BQ_DATASET_NAME
            table_name = BQ_TABLE_NAME
            # create a job config
            job_config = bigquery.LoadJobConfig()
            # Set the destination table
            table_ref = client.dataset(dataset_id).table(table_name)
            #job_config.destination = table_ref
            job_config.write_disposition = 'WRITE_TRUNCATE'

            load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config)
            load_job.result()

            return result
        
        else:
            print("There are no more results to return.")

    # Loop through all defined properties, for up to 100,000 rows of data in each
    for x in range(0,100000,25000):
        y = get_sc_df(PROPERTIES, start_date, end_date, start_row)
        if len(y) < 25000:
            break
        else:
            continue

I took it from this article: https://medium.com/@singularbean/revisiting-google-search-console-data-into-google-bigquery-708a19e2f746

After running the script, here is the error I get(*I changed the actual error url to example.com):

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/webmasters/v3/sites/%5B%27https%3A%2F%2Fwww.example.com%2F%27%5D/searchAnalytics/query?alt=json returned "Request contains 
an invalid argument.". Details: "[{'message': 'Request contains an invalid argument.', 'domain': 'global', 'reason': 'badRequest'}]">

There's no documentation that I can find that provides a guide on how to do this from AZ. So if anyone knows of a better method that has worked for you, please do let me know!

Two issues:

  1. The property should just be the domain PROPERTIES = ["example.com"]

  2. Your authentication. First, run locally and get the OAuth token for the user associated with the GSC account. Then in the cloud function, you'll just use the users' authentication json and refresh the expired token if necessary.

This way you can run it in a cloud function.

Here is a sample code to help. Let me know if you have questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM