
[英]how to load --jars with pyspark with spark standalone on client mode
[英]Cannot load spark-avro jars with databricksversion 10.4
目前,我面临一个问题,因为我们集群上的databricks-connect
运行时已更新到 10.4。 从那以后,我无法再为 spark-avro 加载 jars。 通过运行以下代码
from pyspark.sql import SparkSession
spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()
我收到以下错误:
The jars for the packages stored in: C:\Users\lazlo\.ivy2\jars
org.apache.spark#spark-avro_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8;1.0
confs: [default]
found org.apache.spark#spark-avro_2.12;3.3.0 in central
found org.tukaani#xz;1.8 in central
found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 156ms :: artifacts dl 4ms
:: modules in use:
org.apache.spark#spark-avro_2.12;3.3.0 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.tukaani#xz;1.8 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-dc011dfd-9d25-4d6f-9d0e-354626e7c1f8
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/5ms)
22/08/16 13:15:57 WARN Shell: Did not find winutils.exe: {}
...
Traceback (most recent call last):
File "C:/Aifora/repositories/test_poetry/tmp_jars.py", line 4, in <module>
spark = SparkSession.builder.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0").getOrCreate()
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\sql\session.py", line 229, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 400, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 147, in __init__
self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 210, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\pyspark\context.py", line 337, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\java_gateway.py", line 1568, in __call__
return_value = get_return_value(
File "C:\Users\lazlo\AppData\Local\pypoetry\Cache\virtualenvs\test-poetry-vvodToDL-py3.8\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
如果重要:我使用 windows 机器(Windows 11)并通过诗歌管理包。 这是我的 pyproject.toml
[tool.poetry]
name = "test_poetry"
version = "1.37.5"
description = ""
authors = [
"lazloo xp <lazloo.xp@xxx.com>",
]
[[tool.poetry.source]]
name = "xxx_nexus"
url = "https://nexus.infrastructure.xxxx.net/repository/pypi-all/simple/"
default = true
[tool.poetry.dependencies]
python = "==3.8.*"
databricks-connect = "^10.4"
经过一周的研究,我通过与一位同事核对他的 windows 机器上的环境变量找到了解决方案。 事实证明,以下步骤有所帮助:
现在一切顺利
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.