繁体   English   中英

Spark 读取 BigQuery 外部表

[英]Spark Read BigQuery External Table

尝试从 BigQuery 读取外部表但出现错误

    SCALA_VERSION="2.12"
    SPARK_VERSION="3.1.2"
    com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'

    table = 'data-lake.dataset.member'
    df = spark.read.format('bigquery').load(table)
    df.printSchema()

结果:

root
  |-- createdAtmetadata: date (nullable = true)
  |-- eventName: string (nullable = true)
  |-- producerName: string (nullable = true)

所以当我打印时

df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()

我收到此消息错误:

INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.

由于spark查询不支持外部表,我尝试了另一种方式并得到了!

def read_query_bigquery(project, query):
df = spark.read.format('bigquery') \
  .option("parentProject", "{project}".format(project=project))\
  .option('query', query)\
  .option('viewsEnabled', 'true')\
  .load()

return df

project = 'data-lake'
query = 'select * from data-lake.dataset.member'
spark.conf.set("materializationDataset",'dataset')
df = read_query_bigquery(project, query)
df.show()

bigquery 连接器使用 BigQuery 存储 API 读取数据。 目前这个 API 不支持外部表,这个连接器也不支持它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM