I am trying to solve a problem of pandas when I run python3.7 code on databricks.
The error is:
ImportError: cannot import name 'roperator' from 'pandas.core.ops' (/databricks/python/lib/python3.7/site-packages/pandas/core/ops.py)
the pandas version:
pd.__version__
0.24.2
I run
from pandas.core.ops import roperator
well on my laptop with
pandas 0.25.1
So, I tried to upgrade pandas on databricks.
%sh pip uninstall -y pandas
Successfully uninstalled pandas-1.1.2
%sh pip install pandas==0.25.1
Collecting pandas==0.25.1
Downloading pandas-0.25.1-cp37-cp37m-manylinux1_x86_64.whl (10.4 MB)
Requirement already satisfied: python-dateutil>=2.6.1 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2.8.0)
Requirement already satisfied: numpy>=1.13.3 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (1.16.2)
Requirement already satisfied: pytz>=2017.2 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2018.9)
Requirement already satisfied: six>=1.5 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas==0.25.1) (1.12.0)
Installing collected packages: pandas
ERROR: After October 2020 you may experience errors when installing or updating packages.
This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
mlflow 1.8.0 requires alembic, which is not installed.
mlflow 1.8.0 requires prometheus-flask-exporter, which is not installed.
mlflow 1.8.0 requires sqlalchemy<=1.3.13, which is not installed.
sklearn-pandas 2.0.1 requires numpy>=1.18.1, but you'll have numpy 1.16.2 which is incompatible.
sklearn-pandas 2.0.1 requires pandas>=1.0.5, but you'll have pandas 0.25.1 which is incompatible.
sklearn-pandas 2.0.1 requires scikit-learn>=0.23.0, but you'll have scikit-learn 0.20.3 which is incompatible.
sklearn-pandas 2.0.1 requires scipy>=1.4.1, but you'll have scipy 1.2.1 which is incompatible.
Successfully installed pandas-0.25.1
When I run:
import pandas as pd
pd.__version__
it is still:
0.24.2
Did I missed something ?
thanks
It's really recommended to install libraries via cluster initialization script . The %sh
command is executed only on the driver node, but not on the executor nodes. And it also doesn't affect Python instance that is already running.
The correct solution will be to use dbutils.library
commands , like this:
dbutils.library.installPyPI("pandas", "1.0.1")
dbutils.library.restartPython()
this will install library to all places, but it will require restarting of the Python to pickup new libraries.
Also, although it's possible to specify only package name, it's recommended to specify version explicitly, as some of the library version may not be compatible with runtime. Also, consider usage of the newer runtimes where library versions are already updated - check therelease notes for runtimes to figure out the library versions installed out of the box.
For newer Databricks runtimes you can use new magic commands: %pip
and %conda
to install dependencies. See the documentation for more details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.