简体   繁体   English

“import pyspark.pandas”的导入错误

[英]Import error with "import pyspark.pandas "

This is part of new coursework I am doing.这是我正在做的新课程的一部分。 I am trying to install pyspark and I intend to use pyspark.pandas .我正在尝试安装 pyspark 并且打算使用 pyspark.pandas 。 I try to run a check on my package like this.我尝试像这样检查我的包裹。

import pandas as pd

import numpy as np

import pyspark.pandas as ps

But as I run imports, I see the below error.但是当我运行导入时,我看到以下错误。

ImportError: cannot import name 'print_exec' from 'pyspark.cloudpickle' (C:\Users\smith\Anaconda3\lib\site-packages\pyspark\cloudpickle\__init__.py)

The pyspark version I am using is 3.1.3.我使用的 pyspark 版本是 3.1.3。 I am not sure, I could be wrong at setting paths here.我不确定,我在这里设置路径可能是错误的。 Is there a way I can verify the paths??有没有办法可以验证路径? Or this could be any other issue please let me know.或者这可能是任何其他问题,请告诉我。

Thanks谢谢

Pandas API is available only for PySpark version 3.2, or above. Pandas API 仅适用于 PySpark 3.2 或更高版本。

To upgrade PySpark to its latest release execute the following command:要将 PySpark 升级到其最新版本,请执行以下命令:

!pip install -U --upgrade pyspark

Remove the "!"去除 ”!” if you're not executing the command on a Jupyter Notebook.如果您没有在 Jupyter Notebook 上执行命令。

After restarting your kernel import pyspark.pandas as ps import should work.重新启动内核import pyspark.pandas as ps import 后应该可以工作。

Note笔记

You can also check the PySpark version Python is importing like so:您还可以检查 Python 正在导入的 PySpark 版本,如下所示:

import pyspark

print(pyspark.__version__)
# 3.3.0

Update更新

I've had a look at the history of changes made to broadcast.py (that I believe is where the import is failing), and it seems they've changed the location of print_exc from pyspark.cloudpickle to pyspark.util .我查看了对broadcast.py所做更改的历史(我相信这是导入失败的地方),似乎他们已将print_exc的位置从pyspark.cloudpicklepyspark.util Upgrading should really solve the issue.升级应该可以真正解决问题。

Older version of broadcast.py module:旧版本的broadcast.py模块:

https://github.com/apache/spark/blob/75ea89ad94ca76646e4697cf98c78d14c6e2695f/python/pyspark/broadcast.py#L24 https://github.com/apache/spark/blob/75ea89ad94ca76646e4697cf98c78d14c6e2695f/python/pyspark/broadcast.py#L24

Newer versions:较新的版本:

https://github.com/apache/spark/blob/8f744783531d4f62abdf82643b5eb34d54a2820b/python/pyspark/broadcast.py#L42 https://github.com/apache/spark/blob/8f744783531d4f62abdf82643b5eb34d54a2820b/python/pyspark/broadcast.py#L42

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM