I am trying to find out the python version I am using in Databricks.
To find out I tried
import sys
print(sys.version)
And I got the output as 3.7.3
However when I went to Cluster --> SparkUI --> Environment
I see that the cluster Python version is 2 .
Which version does this refer to?
When I tried running
%sh python --version
I still get Python 3.7.3
Can there be a different python version for each worker / driver node?
Note: I am using a setup where there is 1 worker node and 1 driver node (2 nodes in total with the same spec) and Databricks Runtime Version is 6.5 ML
Update: This issue has been fixed.
For new cluster: If you create a new cluster it will have python environment variable as 3.
For existing clusters: You need to add in Environment Variables tab in Cluster Configuration > Advanced , it changes in the Environmental variable.
PYSPARK_PYTHON=/databricks/python3/bin/python3
Thanks for bringing this to our attention. This is a product-bug, currently I'm working with the product team to fix the issue asap.
The default Python version for clusters created using the UI is Python 3 .
As part of repro, I had created Databricks Runtime Version: 6.5 ML and observed the same behaviour.
Cluster --> SparkUI --> Environment shows incorrect version.
I believe you are running a cluster that is using Databricks Runtime 5.5 or below. What you see when you run
import sys
print(sys.version)
is the python version referred by the PYSPARK_PYTHON environment variable. The one in Cluster --> SparkUI --> Environment is the python version of the Ubuntu instance, which is Python 2.
This works in all notebooks either gooogle colab or MS Azure Databricks :
!python --version
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.