简体   繁体   English

Spark 读取 BigQuery 外部表

[英]Spark Read BigQuery External Table

Trying to Read a external table from BigQuery but gettint a error尝试从 BigQuery 读取外部表但出现错误

    SCALA_VERSION="2.12"
    SPARK_VERSION="3.1.2"
    com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'

    table = 'data-lake.dataset.member'
    df = spark.read.format('bigquery').load(table)
    df.printSchema()

Result:结果:

root
  |-- createdAtmetadata: date (nullable = true)
  |-- eventName: string (nullable = true)
  |-- producerName: string (nullable = true)

So when im print所以当我打印时

df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()

i got this message error:我收到此消息错误:

INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API. INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.

As external tables are not supported in queries by spark, i tried the other way and got!由于spark查询不支持外部表,我尝试了另一种方式并得到了!

def read_query_bigquery(project, query):
df = spark.read.format('bigquery') \
  .option("parentProject", "{project}".format(project=project))\
  .option('query', query)\
  .option('viewsEnabled', 'true')\
  .load()

return df

project = 'data-lake'
query = 'select * from data-lake.dataset.member'
spark.conf.set("materializationDataset",'dataset')
df = read_query_bigquery(project, query)
df.show()

The bigquery connector uses the BigQuery Storage API to read the data. bigquery 连接器使用 BigQuery 存储 API 读取数据。 At the moment this API does not support external tables, this the connector doesn't support them as well.目前这个 API 不支持外部表,这个连接器也不支持它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查表是否存在:Spark bigquery connector - Check if table exists: Spark bigquery connector 无法将具有 JSON/RECORD 列类型的 bigquery 表读入 spark dataframe。(java.lang.IllegalStateException:意外类型:JSON) - Unable to read bigquery table with JSON/RECORD column type into spark dataframe. ( java.lang.IllegalStateException: Unexpected type: JSON) 使用 BigQuery Spark 连接器保存分区表 - Saving partitioned table with BigQuery Spark connector 在 DBT 中读取 Redshift 外部表 - Read Redshift external table in DBT Bigquery 定价比较:将数据加载到 Bigquery 与使用创建外部表 - Bigquery Pricing Comparison : Loading data into Bigquery vs Using Create External Table GCP BigQuery 依赖导致 spark 读取失败,错误为 Jackson 数据绑定依赖 - GCP BigQuery dependency causes failure in spark read with error of Jackson databind dependency Bigquery - 更新 Bigquery 外部表的 GCS 路径 - Bigquery - Updating GCS path of Bigquery external tables BigQuery 外部表 - 我无法再创建 - 确保所有字段都正确才能继续 - BigQuery External Table - i can't create anymore - Make sure all fields are correct to continue 使用 Deployment Manager 在 BigQuery 中创建“外部表”时出现“403 Permission denied while getting Drive credentials” - '403 Permission denied while getting Drive credentials' when using Deployment Manager to create an 'external table' in BigQuery BigQuery:部分表数据“无法将 NUMERIC 类型的字段读取为 INT64 字段” - BigQuery: "Cannot read field of type NUMERIC as INT64 Field" for part of table data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM