![](/img/trans.png)
[英]How to fix "java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" Pyspark
[英]How to fix org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED() when running Spark with Delta Lake?
我正在按照這里關於如何使用 spark 訪問 Delta Lake house 的教程進行操作,但似乎無法正常工作。
我有以下依賴項:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>Runner</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.3</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.2.0</version>
</dependency>
</dependencies>
</project>
我的代碼是:
package org.example;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class Main {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.master", "local")
.getOrCreate();
Dataset<Row> df = spark.range(0, 5).toDF();
df.write().format("delta").save("./tmp/delta-table");
df.show();
}
}
但是當我運行它時,它失敗並出現以下錯誤:
Exception in thread "main" java.lang.NoSuchMethodError: 'org.apache.spark.internal.config.ConfigEntry org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED()'
at io.delta.sql.DeltaSparkSessionExtension.$anonfun$apply$3(DeltaSparkSessionExtension.scala:88)
at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildResolutionRules$1(SparkSessionExtensions.scala:174)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.SparkSessionExtensions.buildResolutionRules(SparkSessionExtensions.scala:174)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.customResolutionRules(BaseSessionStateBuilder.scala:212)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:187)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:179)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:357)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:87)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:87)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:75)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:65)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:205)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:211)
at org.apache.spark.sql.SparkSession.range(SparkSession.scala:550)
at org.apache.spark.sql.SparkSession.range(SparkSession.scala:529)
at org.example.Main.main(Main.java:16)
谷歌搜索如何修復並沒有產生太多結果。 有任何想法嗎?
您正在使用io.delta.delta-core_2.12.2.2.0
和 spark-core 版本3.2.0
,但它的依賴項是 spark-core 版本3.3.1
。
在您的情況下,它正在尋找PARQUET_FIELD_ID_READ_ENABLED
配置參數,但這僅在版本3.3.0
中引入。 那導致你的問題。
嘗試使用 Spark 版本3.3.1
:)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.