[英]Python Version in Azure Databricks
I am trying to find out the python version I am using in Databricks.我正在尝试找出我在 Databricks 中使用的 python 版本。
To find out I tried找出我试过
import sys
print(sys.version)
And I got the output as 3.7.3我得到了 output 作为3.7.3
However when I went to Cluster --> SparkUI --> Environment然而,当我去集群 - > SparkUI - > 环境
I see that the cluster Python version is 2 .我看到集群 Python 版本是2 。
Which version does this refer to?这是指哪个版本?
When I tried running当我尝试跑步时
%sh python --version
I still get Python 3.7.3我仍然得到 Python 3.7.3
Can there be a different python version for each worker / driver node?每个工作程序/驱动程序节点是否可以有不同的 python 版本?
Note: I am using a setup where there is 1 worker node and 1 driver node (2 nodes in total with the same spec) and Databricks Runtime Version is 6.5 ML注意:我使用的设置有 1 个工作节点和 1 个驱动程序节点(共有 2 个节点,规格相同),Databricks 运行时版本为 6.5 ML
Update: This issue has been fixed.更新:此问题已修复。
For new cluster: If you create a new cluster it will have python environment variable as 3.对于新集群:如果您创建一个新集群,它将具有 python 环境变量为 3。
For existing clusters: You need to add in Environment Variables tab in Cluster Configuration > Advanced , it changes in the Environmental variable.对于现有集群:您需要在Cluster Configuration > Advanced的 Environment Variables 选项卡中添加,它会在 Environment 变量中更改。
PYSPARK_PYTHON=/databricks/python3/bin/python3
PYSPARK_PYTHON=/databricks/python3/bin/python3
Thanks for bringing this to our attention.谢谢让我们注意到这个。 This is a product-bug, currently I'm working with the product team to fix the issue asap.
这是一个产品错误,目前我正在与产品团队合作尽快解决问题。
The default Python version for clusters created using the UI is Python 3 .
使用 UI 创建的集群的默认 Python 版本是Python 3 。
As part of repro, I had created Databricks Runtime Version: 6.5 ML and observed the same behaviour.作为 repro 的一部分,我创建了 Databricks Runtime Version: 6.5 ML 并观察到相同的行为。
Cluster --> SparkUI --> Environment shows incorrect version.集群 --> SparkUI --> 环境显示不正确的版本。
I believe you are running a cluster that is using Databricks Runtime 5.5 or below.我相信您正在运行使用 Databricks Runtime 5.5 或更低版本的集群。 What you see when you run
跑步时看到的
import sys
print(sys.version)
is the python version referred by the PYSPARK_PYTHON environment variable.是 PYSPARK_PYTHON 环境变量引用的 python 版本。 The one in Cluster --> SparkUI --> Environment is the python version of the Ubuntu instance, which is Python 2.
Cluster --> SparkUI --> Environment中的那个是Ubuntu实例的python版本,即Python 2。
This works in all notebooks either gooogle colab or MS Azure Databricks :这适用于gooogle colab或MS Azure Databricks的所有笔记本:
!python --version
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.