繁体   English   中英

Python 抛出错误:预期的字节数,得到一个“dict” object

[英]Python throwing error: Expected bytes, got a 'dict' object

我正在尝试从 Google URL 检查 API 获取结果并将其保存到 Google BigQuery 表中。 URL 检查 API 和 BigQuery 上的一切都很好。 我知道,因为我正在使用其他 API 并将数据保存到 bigQuery 中。

但是我有这个 python 脚本,它在执行时会抛出错误。 以下是错误。 看起来错误是 dict object 但是因为我是 python 中的新手所以无法弄清楚错误到底在哪里。 请? 有人可以帮我吗?

另外,如果可能的话,您能否提出我正在努力实现的最佳解决方案。 谢谢

Traceback (most recent call last):
  File "E:\python\gsconsole_online\urlinspection1\u_inspect_module.py", line 105, in <module>
    load_job = bigQueryClient.load_table_from_dataframe(result, table_ref, job_config=job_config)
  File "C:\Python310\lib\site-packages\google\cloud\bigquery\client.py", line 2628, in load_table_from_dataframe
    _pandas_helpers.dataframe_to_parquet(
  File "C:\Python310\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 672, in dataframe_to_parquet
    arrow_table = dataframe_to_arrow(dataframe, bq_schema)
  File "C:\Python310\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 617, in dataframe_to_arrow
    bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
  File "C:\Python310\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py", line 342, in bq_to_arrow_array
    return pyarrow.Array.from_pandas(series, type=arrow_type)
  File "pyarrow\array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
  File "pyarrow\array.pxi", line 312, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object

这是 python 脚本。

from google.oauth2 import service_account
from googleapiclient.discovery import build
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
import pandas as pd

indexScopes = [
    'https://www.googleapis.com/auth/webmasters',
    'https://www.googleapis.com/auth/webmasters.readonly'
    ]

indexCredentials = service_account.Credentials.from_service_account_file("credentials.json", scopes=indexScopes)
indexService = build('searchconsole','v1',credentials=indexCredentials)

indexRequest = {
    'inspectionUrl': 'https://example.com/',
    'siteUrl': 'https://example.com/'
}

response = indexService.urlInspection().index().inspect(body=indexRequest).execute()
inspectionResult = response['inspectionResult']

full_table_name = "bigquery-project.dataset.table"

from time import gmtime, strftime
current_datetime = strftime("%Y-%m-%d %H:%M:%S", gmtime())

result = {"site": [], "json_response": [], "created_at":[], "updated_at":[]}

result["site"].append('https://example.com/')
result["json_response"].append(inspectionResult)
result["created_at"].append(str(current_datetime))
result["updated_at"].append(str(current_datetime))

result = pd.DataFrame.from_dict(result)

bigQueryScopes = ['https://www.googleapis.com/auth/bigquery']
bigQuerycredentials = service_account.Credentials.from_service_account_file("bigquery-consonle.json", scopes=bigQueryScopes)

bigQueryClient = bigquery.Client(credentials=bigQuerycredentials)

try:
    table_ref = bigQueryClient.get_table(full_table_name)  # Make an API request.
except NotFound:
    schema = [
        bigquery.SchemaField("site", "STRING"),
        bigquery.SchemaField("json_response", "STRING"),
        bigquery.SchemaField("created_at", "DATETIME"),
        bigquery.SchemaField("updated_at", "DATETIME"),
    ]

    table     = bigquery.Table(full_table_name, schema=schema)
    table     = bigQueryClient.create_table(table)
    table_ref = bigQueryClient.get_table(full_table_name)

job_config = bigquery.LoadJobConfig(schema = [
        bigquery.SchemaField("site", "STRING"),
        bigquery.SchemaField("json_response", "STRING"),
        bigquery.SchemaField("created_at", "DATETIME"),
        bigquery.SchemaField("updated_at", "DATETIME"),
    ], autodetect=False)

#job_config.destination = table_ref
job_config.write_disposition = 'WRITE_APPEND'

load_job = bigQueryClient.load_table_from_dataframe(result, table_ref, job_config=job_config)
load_job.result()

附加在result dataframe 的json_response中的insertionResult是一个dict ,而它在模式中被声明为一个字符串。

insertionResult序列化为 JSON 编码的字符串。

import json

inspectionResult = json.dumps(response['inspectionResult'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM