简体   繁体   English

Azure Databricks 中的 Python 版本

[英]Python Version in Azure Databricks

I am trying to find out the python version I am using in Databricks.我正在尝试找出我在 Databricks 中使用的 python 版本。

To find out I tried找出我试过

import sys
print(sys.version)

And I got the output as 3.7.3我得到了 output 作为3.7.3

However when I went to Cluster --> SparkUI --> Environment然而,当我去集群 - > SparkUI - > 环境

I see that the cluster Python version is 2 .我看到集群 Python 版本是2

Which version does this refer to?这是指哪个版本?

When I tried running当我尝试跑步时

%sh python --version

I still get Python 3.7.3我仍然得到 Python 3.7.3

Can there be a different python version for each worker / driver node?每个工作程序/驱动程序节点是否可以有不同的 python 版本?

Note: I am using a setup where there is 1 worker node and 1 driver node (2 nodes in total with the same spec) and Databricks Runtime Version is 6.5 ML注意:我使用的设置有 1 个工作节点和 1 个驱动程序节点(共有 2 个节点,规格相同),Databricks 运行时版本为 6.5 ML

Update: This issue has been fixed.更新:此问题已修复。

For new cluster: If you create a new cluster it will have python environment variable as 3.对于新集群:如果您创建一个新集群,它将具有 python 环境变量为 3。

For existing clusters: You need to add in Environment Variables tab in Cluster Configuration > Advanced , it changes in the Environmental variable.对于现有集群:您需要在Cluster Configuration > Advanced的 Environment Variables 选项卡中添加,它会在 Environment 变量中更改。

PYSPARK_PYTHON=/databricks/python3/bin/python3 PYSPARK_PYTHON=/databricks/python3/bin/python3

在此处输入图像描述


Thanks for bringing this to our attention.谢谢让我们注意到这个。 This is a product-bug, currently I'm working with the product team to fix the issue asap.这是一个产品错误,目前我正在与产品团队合作尽快解决问题。

The default Python version for clusters created using the UI is Python 3 .使用 UI 创建的集群的默认 Python 版本是Python 3

As part of repro, I had created Databricks Runtime Version: 6.5 ML and observed the same behaviour.作为 repro 的一部分,我创建了 Databricks Runtime Version: 6.5 ML 并观察到相同的行为。

Cluster --> SparkUI --> Environment shows incorrect version.集群 --> SparkUI --> 环境显示不正确的版本。

在此处输入图像描述

在此处输入图像描述

I believe you are running a cluster that is using Databricks Runtime 5.5 or below.我相信您正在运行使用 Databricks Runtime 5.5 或更低版本的集群。 What you see when you run跑步时看到的

import sys
print(sys.version)

is the python version referred by the PYSPARK_PYTHON environment variable.是 PYSPARK_PYTHON 环境变量引用的 python 版本。 The one in Cluster --> SparkUI --> Environment is the python version of the Ubuntu instance, which is Python 2. Cluster --> SparkUI --> Environment中的那个是Ubuntu实例的python版本,即Python 2。

Source 资源

This works in all notebooks either gooogle colab or MS Azure Databricks :这适用于gooogle colabMS Azure Databricks的所有笔记本:

!python --version

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM