简体   繁体   English

在 python 和 Debian 中运行 delta lake 作为独立的 spark

[英]Running delta lake in python and Debian as standalone spark

I want to use a delta lake in python. I installed spark as stand alone and anaconda in Debian 11.6.我想在 python 中使用三角洲湖。我在 Debian 11.6 中单独安装了 spark 和 anaconda。

The code that I try to run delta lake is:我尝试运行 delta lake 的代码是:

import pyspark
from delta import *

builder = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

But the above code arise this error:但是上面的代码出现了这个错误:

:: loading settings :: url = jar:file:/usr/bin/spark-3.3.1-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml

Ivy Default Cache set to: /home/boss/.ivy2/cache
The jars for the packages stored in: /home/boss/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-290d27e6-7e29-475f-81b5-1ab1331508fc;1.0
    confs: [default]
    found io.delta#delta-core_2.12;2.2.0 in central
    found io.delta#delta-storage;2.2.0 in central
    found org.antlr#antlr4-runtime;4.8 in central
:: resolution report :: resolve 272ms :: artifacts dl 10ms
    :: modules in use:
    io.delta#delta-core_2.12;2.2.0 from central in [default]
    io.delta#delta-storage;2.2.0 from central in [default]
    org.antlr#antlr4-runtime;4.8 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-290d27e6-7e29-475f-81b5-1ab1331508fc
    confs: [default]
    0 artifacts copied, 3 already retrieved (0kB/11ms)

23/01/24 04:10:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

How I can solve this problem?我该如何解决这个问题?

This is really not an error but:这实际上不是错误,而是:

  • debug information about fetching the necessary dependencies有关获取必要依赖项的调试信息
  • warning about inability to find a library with the native code, but it doesn't prevent from working, just could be a bit slower because it uses the Java code.警告无法找到具有本机代码的库,但它不会阻止工作,只是可能会慢一点,因为它使用 Java 代码。 It could be solved by either installing necessary libraries or adding them to the search path.它可以通过安装必要的库或将它们添加到搜索路径来解决。 See this answer or this article for instructions有关说明,请参阅此答案本文

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM