简体   繁体   English

在 PySpark 中创建 SparkSession 时出错

[英]Error when creating SparkSession in PySpark

When I am trying to create a sparksession I get this error:当我尝试创建 sparksession 时,出现此错误:

spark = SparkSession.builder.appName("Practice").getOrCreate() py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM spark = SparkSession.builder.appName("Practice").getOrCreate() py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM

This is my code:这是我的代码:

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Practice").getOrCreate()

What am I doing wrong.我究竟做错了什么。 I am actually following a tutorial online and the commands are exactly the same.我实际上正在在线学习教程,并且命令完全相同。 However the tutorial is doing it in Jupyter notebooks and I am doing it in VS Code.但是本教程是在 Jupyter 笔记本中进行的,而我是在 VS Code 中进行的。

Traceback:追溯:

22/09/01 08:50:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "c:\Users\BERNARD JOSHUA\OneDrive\Desktop\Swinburne Computer Science\PySpark\pySpark_test.py", line 4, in <module>
    spark = SparkSession.builder.appName("Practice").getOrCreate()
  File "C:\Users\BERNARD JOSHUA\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\session.py", line 269, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "C:\Users\BERNARD JOSHUA\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 483, in getOrCreate    
    SparkContext(conf=conf or SparkConf())
  File "C:\Users\BERNARD JOSHUA\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 197, in __init__       
    self._do_init(
  File "C:\Users\BERNARD JOSHUA\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 302, in _do_init       
    self._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc)
  File "C:\Users\BERNARD JOSHUA\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py", line 1547, in __getattr__ 
    raise Py4JError(
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
PS C:\Users\BERNARD JOSHUA\OneDrive\Desktop\Swinburne Computer Science\PySpark> SUCCESS: The process with PID 18428 (child process of PID 11272) has been terminated.
SUCCESS: The process with PID 11272 (child process of PID 16416) has been terminated.
SUCCESS: The process with PID 16416 (child process of PID 788) has been terminated.

Both my PySpark and Spark are the same versions.我的 PySpark 和 Spark 都是相同的版本。

Can you try any of the following solutions:您可以尝试以下任何一种解决方案:

Solution 1解决方案 1

Install findspark安装 findspark

pip install findspark

In you code use:在您的代码中使用:

import findspark
findspark.init() 

Optionally you can also specify "/path/to/spark" in the init method above:或者,您还可以在上面的 init 方法中指定"/path/to/spark"

findspark.init("/path/to/spark")

Solution 2:解决方案2:

As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext , adding PYTHONPATH environment variable (with value as:如@ pyspark 所述错误在初始化 SparkContext 时 jvm 错误中不存在,添加PYTHONPATH环境变量(值为:

%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH% , %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH% ,

  • just check what py4j version you have in your spark/python/lib folder) helped resolve this issue.只需检查您的spark/python/lib文件夹中的py4j版本)有助于解决此问题。

no attribute 'getorCreate'. Did you mean: 'getOrCreate'?

Try capitalising the "o".尝试大写“o”。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pyspark 中 SparkSession 的导入错误 - Import Error for SparkSession in Pyspark 在pyspark中创建数据框时,&#39;PipelinedRDD&#39;对象没有属性&#39;sparkSession&#39; - 'PipelinedRDD' object has no attribute 'sparkSession' when creating dataframe in pyspark 为什么PySpark创建SparkSession时找不到spark-submit? - Why does PySpark not find spark-submit when creating a SparkSession? 使用 pyspark 创建 SparkSession 时出现问题 - Problem while creating SparkSession using pyspark Spark 3.0.0 创建 SparkSession 时出错:pyspark.sql.utils.IllegalArgumentException:<exception str() failed></exception> - Spark 3.0.0 error creating SparkSession: pyspark.sql.utils.IllegalArgumentException: <exception str() failed> 使用pyspark创建sparksession后是否需要停止spark? - Do I need to stop spark after creating sparksession using pyspark? 系统在使用 PySpark 创建 SparkSession 时找不到指定的路由 - System cannot find the specified route on creating SparkSession with PySpark pyspark 错误:AttributeError:&#39;SparkSession&#39; 对象没有属性 &#39;parallelize&#39; - pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' 在Pyspark中评估分类器时,“ SparkSession”对象没有属性“ serializer” - 'SparkSession' object has no attribute 'serializer' when evaluating a classifier in Pyspark 使用 pyspark 创建火花 dataframe 时出现 Py4J 错误 - Py4J error when creating a spark dataframe using pyspark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM