简体   繁体   English

Pyspark 数据框在显示数据框内容时显示错误

[英]Pyspark dataframe shows error while displaying the dataframe contents

I am using spark 2.3.2 and using pyspark to read from the hive.我正在使用 spark 2.3.2 并使用 pyspark 从配置单元读取。 Here is my code;这是我的代码;

from pyspark import SparkContext
from pyspark.sql import SQLContext
sql_sc = SQLContext(sc)
SparkContext.setSystemProperty("hive.metastore.uris", "thrift://17.20.24.186:9083").enableHiveSupport().getOrCreate()
df=sql_sc.sql("SELECT * FROM mtsods.model_result_abt")
df.show() ## here is where error occurs

When ever i tried to display the contents of the dataframe an error occur as shown below,当我试图显示数据帧的内容时,会发生如下所示的错误,

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-32-1a6ce2362cd4> in <module>()
----> 1 df.show()

C:\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
    348         """
    349         if isinstance(truncate, bool) and truncate:
--> 350             print(self._jdf.showString(n, 20, vertical))
    351         else:
    352             print(self._jdf.showString(n, int(truncate), vertical))

C:\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

C:\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

C:\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling o419.showString.
: java.lang.AssertionError: assertion failed: No plan for HiveTableRelation `mtsods`.`model_result_abt`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [feature#319, profile_id#320, model_id#321, value#322, score#323, rank#324, year_d#325, taxpayer#326, it_ref_no#327]

    at scala.Predef$.assert(Predef.scala:170)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3254)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
    at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Even df.count(), df.head(),df.first() shows the same error.甚至 df.count(), df.head(),df.first() 也显示相同的错误。 How can I view the contents of the dataframe created?如何查看创建的数据框的内容?

Note: this same query works fine in hue(cloudera)--hive注意:同样的查询在 Hue(cloudera)--hive 中工作正常

Its not because of show, or count operation.它不是因为显示或计数操作。 Spark work in lazy evaluation model. Spark 在惰性评估模型中工作。 Hence you are facing error while apply any action operation.因此,您在应用任何动作操作时都面临错误。

Use before config while using spark submit在使用 spark submit 时使用 before config

--conf spark.sql.catalogImplementation=hive 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM