简体   繁体   中英

Python Version in Azure Databricks

I am trying to find out the python version I am using in Databricks.

To find out I tried

import sys
print(sys.version)

And I got the output as 3.7.3

However when I went to Cluster --> SparkUI --> Environment

I see that the cluster Python version is 2 .

Which version does this refer to?

When I tried running

%sh python --version

I still get Python 3.7.3

Can there be a different python version for each worker / driver node?

Note: I am using a setup where there is 1 worker node and 1 driver node (2 nodes in total with the same spec) and Databricks Runtime Version is 6.5 ML

Update: This issue has been fixed.

For new cluster: If you create a new cluster it will have python environment variable as 3.

For existing clusters: You need to add in Environment Variables tab in Cluster Configuration > Advanced , it changes in the Environmental variable.

PYSPARK_PYTHON=/databricks/python3/bin/python3

在此处输入图像描述


Thanks for bringing this to our attention. This is a product-bug, currently I'm working with the product team to fix the issue asap.

The default Python version for clusters created using the UI is Python 3 .

As part of repro, I had created Databricks Runtime Version: 6.5 ML and observed the same behaviour.

Cluster --> SparkUI --> Environment shows incorrect version.

在此处输入图像描述

在此处输入图像描述

I believe you are running a cluster that is using Databricks Runtime 5.5 or below. What you see when you run

import sys
print(sys.version)

is the python version referred by the PYSPARK_PYTHON environment variable. The one in Cluster --> SparkUI --> Environment is the python version of the Ubuntu instance, which is Python 2.

Source

This works in all notebooks either gooogle colab or MS Azure Databricks :

!python --version

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM