简体   繁体   English

Pyspark 将 dataframe 写入 bigquery [错误 gs]

[英]Pyspark write dataframe to bigquery [error gs]

I'm trying to write a dataframe to a bigquery table.我正在尝试将 dataframe 写入 bigquery 表。 I have set the sparkSession with the required parameters.我已经使用所需参数设置了 sparkSession。 However, at the moment of doing the write I get an error:但是,在写入时出现错误:

Py4JJavaError: An error occurred while calling o114.save.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)

The code is the following one:代码如下:

import findspark
findspark.init()

import pyspark
from pyspark.sql import SparkSession

spark2 = SparkSession.builder\
    .config("spark.jars", "/Users/xyz/Downloads/gcs-connector-hadoop2-latest.jar") \
    .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.18.0")\
    .config("google.cloud.auth.service.account.json.keyfile", "/Users/xyz/Downloads/MyProject-cd7627f8ef9b.json") \
    .getOrCreate()

spark2.conf.set("parentProject", "xyz")

data=spark2.createDataFrame(
    [
        ("AAA", 51), 
        ("BBB", 23),
    ],
    ['codiPuntSuministre', 'valor'] 
)

spark2.conf.set("temporaryGcsBucket","bqconsumptions")

data.write.format('bigquery') \
    .option("credentialsFile", "/Users/xyz/Downloads/MyProject-xyz.json")\
    .option('table', 'consumptions.c1') \
    .mode('append') \
    .save()

df=spark2.read.format("bigquery").option("credentialsFile", "/Users/xyz/Downloads/MyProject-xyz.json")\
    .load("consumptions.c1")

I don't get any error if taking out the write from the code, so the error comes when trying to write and may be with something related to the auxiliar bucket to operate with bigquery如果从代码中取出写入,我不会收到任何错误,因此在尝试写入时会出现错误,并且可能与辅助存储桶相关的东西与 bigquery 一起运行

the error here suggests that it is not able to recognize the filesystem, you can use the below link for adding the support for gs filesystem, it happens because when you write to bigquery the files are loaded to google cloud storage bucket temporarily and then it is loaded into the bigquery table.这里的错误表明它无法识别文件系统,您可以使用下面的链接添加对 gs 文件系统的支持,这是因为当您写入 bigquery 时,文件会暂时加载到谷歌云存储桶然后它是加载到 bigquery 表中。

spark._jsc.hadoopConfiguration().set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 PySpark dataframe 写入 DynamoDB 表? - How to write PySpark dataframe to DynamoDB table? 使用 to_dataframe() 作为 BigQuery 管理员角色时出现 BigQuery 权限错误 - BigQuery Permission error when using to_dataframe() as BigQuery Admin role 从 Databricks 将 pyspark df 写入 BigQuery 时出错 - Error when writing pyspark df to BigQuery from Databricks 以批处理模式将 pandas dataframe(来自 CSV)写入 BigQuery - Write pandas dataframe (from CSV) to BigQuery in batch mode 如何将pyspark dataframe直接写入S3 bucket? - How to write pyspark dataframe directly into S3 bucket? 使用 PySpark 从 BigQuery 读取和写入数据:错误 `Failed to find data source: bigquery` - Reading and writing data from BigQuery, using PySpark: ERROR `Failed to find data source: bigquery` 使用 simba JDBC 从 pyspark 连接到 BigQuery - Connect to BigQuery from pyspark using simba JDBC 使用 fs.createWriteStream 将数据写入 bigquery (node.js) 时出现模式错误 - schema error when using fs.createWriteStream to write data to bigquery (node.js) 写入 BigQuery 重试转换或阶段? - Write To BigQuery retry on transform or stage? 使用 BigQuery 存储时 golang 中的 BigQuery 可为空类型写入 API - BigQuery nullable types in golang when using BigQuery storage write API
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM