Trying to find a way to use this script into a cloud function which keeps throwing me errors. I need help finding out what is going wrong, how to fix it and get it to work as a cloud function.
Here's the full script, I edited out some details like the url and client secret names but these are not what are driving the error.
from tracemalloc import start
from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
from google.cloud import bigquery
from datetime import date, timedelta, datetime
PROPERTIES = ["https://example.com"]
BQ_DATASET_NAME = 'gsc_2022'
BQ_TABLE_NAME = 'pipeline-3'
CRED_PATH = "example_credential_path.json"
LOCATION = "us-central1"
start_date = (datetime.now()-timedelta(days=2)).strftime("%Y-%m-%d")
end_date = (datetime.now()-timedelta(days=2)).strftime("%Y-%m-%d")
start_row = 0
version = 'v1'
SCOPES = ['https://www.googleapis.com/auth/webmasters']
credentials = service_account.Credentials.from_service_account_file(
CRED_PATH, scopes=SCOPES)
def get_sc_df(site_url,start_date,end_date,start_row):
#Grab Search Console data for the specific property and send it to BigQuery
service = build('webmasters', 'v3', credentials=credentials)
request = {
'startDate': start_date,
'endDate': end_date,
'dimensions': ['date','query', 'page', 'device','country'], # uneditable to enforce a nice clean dataframe at the end!
'rowLimit': 25000,
'startRow': start_row
}
response = service.searchanalytics().query(siteUrl=site_url, body=request).execute()
if len(response) > 1:
x = response['rows']
df = pd.DataFrame.from_dict(x)
# split the keys list into columns
df[['query','device', 'page', 'date','country']] = pd.DataFrame(df['keys'].values.tolist(), index= df.index)
# Drop the key columns
result = df.drop(['keys'],axis=1)
# Add a website identifier
#result['website'] = site_url
# establish a BigQuery client
client = bigquery.Client.from_service_account_json(CRED_PATH)
dataset_id = BQ_DATASET_NAME
table_name = BQ_TABLE_NAME
# create a job config
job_config = bigquery.LoadJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table(table_name)
#job_config.destination = table_ref
job_config.write_disposition = 'WRITE_TRUNCATE'
load_job = client.load_table_from_dataframe(result, table_ref, job_config=job_config)
load_job.result()
return result
else:
print("There are no more results to return.")
# Loop through all defined properties, for up to 100,000 rows of data in each
for x in range(0,100000,25000):
y = get_sc_df(PROPERTIES, start_date, end_date, start_row)
if len(y) < 25000:
break
else:
continue
I took it from this article: https://medium.com/@singularbean/revisiting-google-search-console-data-into-google-bigquery-708a19e2f746
After running the script, here is the error I get(*I changed the actual error url to example.com):
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/webmasters/v3/sites/%5B%27https%3A%2F%2Fwww.example.com%2F%27%5D/searchAnalytics/query?alt=json returned "Request contains
an invalid argument.". Details: "[{'message': 'Request contains an invalid argument.', 'domain': 'global', 'reason': 'badRequest'}]">
There's no documentation that I can find that provides a guide on how to do this from AZ. So if anyone knows of a better method that has worked for you, please do let me know!
Two issues:
The property should just be the domain PROPERTIES = ["example.com"]
Your authentication. First, run locally and get the OAuth token for the user associated with the GSC account. Then in the cloud function, you'll just use the users' authentication json and refresh the expired token if necessary.
This way you can run it in a cloud function.
Here is a sample code to help. Let me know if you have questions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.