[英]Spark Read BigQuery External Table
Trying to Read a external table from BigQuery but gettint a error尝试从 BigQuery 读取外部表但出现错误
SCALA_VERSION="2.12"
SPARK_VERSION="3.1.2"
com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'
table = 'data-lake.dataset.member'
df = spark.read.format('bigquery').load(table)
df.printSchema()
Result:结果:
root
|-- createdAtmetadata: date (nullable = true)
|-- eventName: string (nullable = true)
|-- producerName: string (nullable = true)
So when im print所以当我打印时
df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()
i got this message error:我收到此消息错误:
INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API. INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.
As external tables are not supported in queries by spark, i tried the other way and got!由于spark查询不支持外部表,我尝试了另一种方式并得到了!
def read_query_bigquery(project, query):
df = spark.read.format('bigquery') \
.option("parentProject", "{project}".format(project=project))\
.option('query', query)\
.option('viewsEnabled', 'true')\
.load()
return df
project = 'data-lake'
query = 'select * from data-lake.dataset.member'
spark.conf.set("materializationDataset",'dataset')
df = read_query_bigquery(project, query)
df.show()
The bigquery connector uses the BigQuery Storage API to read the data. bigquery 连接器使用 BigQuery 存储 API 读取数据。 At the moment this API does not support external tables, this the connector doesn't support them as well.目前这个 API 不支持外部表,这个连接器也不支持它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.