簡體   English   中英

Spark 1.4.1 py4j.Py4JException:方法read([])不存在

[英]Spark 1.4.1 py4j.Py4JException: Method read([]) does not exist

我正在Eclipse IDE中使用Pyspark進行編程,並一直試圖過渡到Spark 1.4.1,以便最終可以使用Python 3進行編程。以下程序在Spark 1.3.1中有效,但在Spark 1.4.1中引發異常:

from pyspark import SparkContext, SparkConf 
from pyspark.sql.types import * 
from pyspark.sql import SQLContext 

if __name__ == '__main__': 
    conf = SparkConf().setAppName("MyApp").setMaster("local") 

    global sc 
    sc = SparkContext(conf=conf)     

    global sqlc 
    sqlc = SQLContext(sc) 

    symbolsPath = 'SP500Industry.json' 
    symbolsRDD = sqlc.read.json(symbolsPath) 

    print "Done"" 

我得到的追溯如下:

Traceback (most recent call last): 
  File "/media/gavin/20A6-76BF/Current Projects Luna/PySpark               Test/Test.py", line 21, in <module>
  symbolsRDD = sqlc.read.json(symbolsPath) #rdd with all symbols (and their industries 
  File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 582, in read 
  return DataFrameReader(self) 
  File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 39, in __init__ 
self._jreader = sqlContext._ssql_ctx.read() 
  File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
  File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 304, in get_return_value 
py4j.protocol.Py4JError: An error occurred while calling o18.read.          Trace: 
py4j.Py4JException: Method read([]) does not exist 
    at         py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) 
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) 
    at py4j.Gateway.invoke(Gateway.java:252) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745)" 

我為該項目提供的外部庫是... spark-1.4.1-bin-hadoop2.6 / python ... spark-1.4.1-bin-hadoop2.6 / python / lib / py4j-0.8.2.1- src.zip ... spark-1.4.1-bin-hadoop2.6 / python / lib / pyspark.zip(嘗試包含和不包含此文件)

有人可以幫我解決我做錯的事情嗎?

您需要在調用調用之前將格式設置為“ json”。 否則,spark會假定您正在嘗試加載Parquet文件。

symbolsRDD = sqlc.read.format('json').json(symbolsPath) 

但是,我仍然無法弄清為什么出現讀取方法錯誤。 Spark應該抱怨它發現了無效的Parquet文件。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM