[英]How to read BigQuery table from java spark with BigQuery connector
我正在嘗試通過 spark java 代碼讀取 bigquery 表,如下所示:
BigQuerySQLContext bqSqlCtx = new BigQuerySQLContext(sqlContext);
bqSqlCtx.setGcpJsonKeyFile("sxxxl-gcp-1x4c0xxxxxxx.json");
bqSqlCtx.setBigQueryProjectId("winged-standard-2xxxx");
bqSqlCtx.setBigQueryDatasetLocation("asia-east1");
bqSqlCtx.setBigQueryGcsBucket("dataproc-9cxxxxx39-exxdc-4e73-xx07- 2258xxxx4-asia-east1");
Dataset<Row> testds = bqSqlCtx.bigQuerySelect("select * from bqtestdata.customer_visits limit 100");
但我面臨以下問題:
19/01/14 10:52:01 WARN org.apache.spark.sql.SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
19/01/14 10:52:01 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing query select * from bqtestdata.customer_visits limit 100
19/01/14 10:52:02 INFO com.samelamin.spark.bigquery.BigQueryClient: Creating staging dataset winged-standard-2xxxxx:spark_bigquery_staging_asia-east1
Exception in thread "main" java.util.concurrent.ExecutionException: com.google.api.client.googleapis.json.GoogleJsonResponseException:
400 Bad Request
{
"code" : 400,
"errors" :
[ {
"domain" : "global",
**"message" : "Invalid dataset ID \"spark_bigquery_staging_asia-east1\". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",**
"reason" : "invalid"
} ],
"message" : "Invalid dataset ID \"spark_bigquery_staging_asia-east1\". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",
"status" : "INVALID_ARGUMENT"
}
響應中的消息
Dataset IDs must be alphanumeric (plus underscores)...
表示dataset ID
"spark_bigquery_staging_asia-east1" 無效,因為其中包含連字符,特別是在asia-east1
。
我在samelamin 的Scala 庫中遇到了類似的問題。 顯然,這是由於圖書館無法處理美國和歐盟以外的位置,因此圖書館將無法訪問來自asia-east1
數據集。
目前,我使用 BigQuery Spark 連接器從 BigQuery 加載和寫入我的數據。
如果您能夠找到使用此庫的解決方法,也請分享。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.