[英]How to specify a python version in a Databricks Cluster
I am trying to install a wheel on a Databricks Cluster.我正在尝试在 Databricks 集群上安装一个轮子。 Unfortunatly this wheel has the requirement:
不幸的是,这个轮子有要求:
python_requires='==3.6.8'
On Databricks Clusters the version 3.7.3 is used and so the installation of the wheel is failing.在 Databricks Clusters 上,使用的是 3.7.3 版本,因此轮子的安装失败。 How can I install a lower python version on those clusters?
如何在这些集群上安装较低的 python 版本?
What I tried:我尝试了什么:
Switch to an anaconda supported cluster and create a virtualenv with the specific version in the init script --> this runs into an error which stops the cluster from starting (based on this https://docs.databricks.com/runtime/mlruntime.html ).切换到 anaconda 支持的集群,并在 init 脚本中创建具有特定版本的 virtualenv --> 这会遇到一个错误,导致集群无法启动(基于此https://docs.databricks.com/runtime/mlruntime。 html )。
Is there another way to set up a virtualenv which can be used on alle nodes of the cluster?是否有另一种方法来设置可在集群的所有节点上使用的 virtualenv?
Thanks!谢谢!
Update更新
So I tried the next thing:所以我尝试了接下来的事情:
I created a init script:我创建了一个初始化脚本:
#!/bin/bash
wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
tar xvf Python-3.6.8.tgz
ls
pwd
cd Python-3.6.8
ls
pwd
./configure --enable-optimizations --enable-shared
make -j8
sudo make altinstall
python3.6
to install Python 3.6.8 on the cluster (this takes quite a while).在集群上安装 Python 3.6.8(这需要很长时间)。
The init script fails--> here is the error log:初始化脚本失败-->这里是错误日志:
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
make[1]: [clean] Error 1 (ignored)
Executing <Task finished coro=<CoroutineTests.test_async_def_wrapped.<locals>.start() done, defined at /Python-3.6.8/Lib/test/test_asyncio/test_pep492.py:150> result=None created at /Python-3.6.8/Lib/asyncio/base_events.py:463> took 0.168 seconds
stty: 'standard input': Inappropriate ioctl for device
/Python-3.6.8/Modules/expat/xmlparse.c: In function ‘appendAttributeValue’:
/Python-3.6.8/Modules/expat/xmlparse.c:5577:40: warning: array subscript is above array bounds [-Warray-bounds]
if (!poolAppendChar(pool, buf[i]))
^
/Python-3.6.8/Modules/expat/xmlparse.c:545:27: note: in definition of macro ‘poolAppendChar’
: ((*((pool)->ptr)++ = c), 1))
^
/Python-3.6.8/Modules/expat/xmlparse.c:5577:40: warning: array subscript is above array bounds [-Warray-bounds]
if (!poolAppendChar(pool, buf[i]))
^
/Python-3.6.8/Modules/expat/xmlparse.c:545:27: note: in definition of macro ‘poolAppendChar’
: ((*((pool)->ptr)++ = c), 1))
^
python3.6: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
The unpacking and loading of the tar is working fine, however after the second ls/pwd the errors occured. tar 的解包和加载工作正常,但是在第二个 ls/pwd 之后发生了错误。 In gerenal so far the python is installed "somewhere".
到目前为止,通常 python 安装在“某处”。 How can I redirect, so it will be installed at /databricks/python3/bin/python?
如何重定向,所以它将安装在 /databricks/python3/bin/python 中?
Thank you!谢谢!
Somethign like this should work in the init script:像这样的东西应该在初始化脚本中工作:
#!/bin/bash
sudo wget https://repo.continuum.io/archive/Anaconda3-5.3.0-Linux-x86_64.sh
sudo bash Anaconda3-5.2.0-Linux-x86_64.sh -b -p /anaconda3
echo "PYSPARK_PYTHON=/anaconda3/bin/python3" >> /databricks/spark/conf/spark-env.sh
DBR 5.5 should have Python 3.6 as well, you could try using that version. DBR 5.5 也应该有 Python 3.6,您可以尝试使用该版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.