簡體   English   中英

使用 Delta Lake 運行 Spark 時如何修復 org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED()?

[英]How to fix org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED() when running Spark with Delta Lake?

我正在按照這里關於如何使用 spark 訪問 Delta Lake house 的教程進行操作,但似乎無法正常工作。

我有以下依賴項:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>Runner</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.2.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>3.2.3</version>
        </dependency>
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>2.2.0</version>
        </dependency>
    </dependencies>

</project>

我的代碼是:

package org.example;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class Main {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
                .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
                .config("spark.master", "local")
                .getOrCreate();

        Dataset<Row> df = spark.range(0, 5).toDF();
        df.write().format("delta").save("./tmp/delta-table");
        df.show();
    }
}

但是當我運行它時,它失敗並出現以下錯誤:

Exception in thread "main" java.lang.NoSuchMethodError: 'org.apache.spark.internal.config.ConfigEntry org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED()'
    at io.delta.sql.DeltaSparkSessionExtension.$anonfun$apply$3(DeltaSparkSessionExtension.scala:88)
    at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildResolutionRules$1(SparkSessionExtensions.scala:174)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.spark.sql.SparkSessionExtensions.buildResolutionRules(SparkSessionExtensions.scala:174)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.customResolutionRules(BaseSessionStateBuilder.scala:212)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:187)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:179)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:357)
    at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:87)
    at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:87)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:75)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:75)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:73)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:65)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:205)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:211)
    at org.apache.spark.sql.SparkSession.range(SparkSession.scala:550)
    at org.apache.spark.sql.SparkSession.range(SparkSession.scala:529)
    at org.example.Main.main(Main.java:16)

谷歌搜索如何修復並沒有產生太多結果。 有任何想法嗎?

您正在使用io.delta.delta-core_2.12.2.2.0和 spark-core 版本3.2.0 ,但它的依賴項是 spark-core 版本3.3.1

在您的情況下,它正在尋找PARQUET_FIELD_ID_READ_ENABLED配置參數,但這僅在版本3.3.0中引入。 那導致你的問題。

嘗試使用 Spark 版本3.3.1 :)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM