简体   繁体   中英

Running delta lake in python and Debian as standalone spark

I want to use a delta lake in python. I installed spark as stand alone and anaconda in Debian 11.6.

The code that I try to run delta lake is:

import pyspark
from delta import *

builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

But the above code arise this error:

:: loading settings :: url = jar:file:/usr/bin/spark-3.3.1-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml

Ivy Default Cache set to: /home/boss/.ivy2/cache
The jars for the packages stored in: /home/boss/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-290d27e6-7e29-475f-81b5-1ab1331508fc;1.0
    confs: [default]
    found io.delta#delta-core_2.12;2.2.0 in central
    found io.delta#delta-storage;2.2.0 in central
    found org.antlr#antlr4-runtime;4.8 in central
:: resolution report :: resolve 272ms :: artifacts dl 10ms
    :: modules in use:
    io.delta#delta-core_2.12;2.2.0 from central in [default]
    io.delta#delta-storage;2.2.0 from central in [default]
    org.antlr#antlr4-runtime;4.8 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-290d27e6-7e29-475f-81b5-1ab1331508fc
    confs: [default]
    0 artifacts copied, 3 already retrieved (0kB/11ms)

23/01/24 04:10:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

How I can solve this problem?

This is really not an error but:

  • debug information about fetching the necessary dependencies
  • warning about inability to find a library with the native code, but it doesn't prevent from working, just could be a bit slower because it uses the Java code. It could be solved by either installing necessary libraries or adding them to the search path. See this answer or this article for instructions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM