简体   繁体   中英

How to write and read delta format with Zeppelin?(Delta Lake installed and imported)

I have installed Delta Lake package(delta-spark) in Zeppelin env and added delta-core dependency to spark - io.delta:delta-core_2.12:1.0.0.


spark: 3.1.1.
scala: 2.12.10

However, once I try to write or read data in delta format, it throws errors. Anyone knows what might go wrong here or how to fix it? Thanks!

%spark.pyspark
from delta import *
builder = SparkSession.builder.appName("MyApp") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

data = spark.range(0, 5)
data.write.format("delta").save("hdfs://my-hdfs-namenode-0.my-hdfs-namenode.hdfs-explore.svc.cluster.local/tmp/delta-table-1")

I guess the main issue here is that your cluster doesn't see the jar file of delta lake. If you added dependency using zeppelin spark interpreter, it has an impact only on Apache Zeppelin. You should upload the jar file of delta lake to /usr/lib/spark/jars/ in your cluster. The jar file can be downloaded from here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM