简体繁体 English

Spark 数据帧不会显示（） - Py4JJavaError：调用 o426.showString 时发生错误

[英]Spark dataframe will not show() - Py4JJavaError: An error occurred while calling o426.showString

原文 2018-12-06 17:00:27 0 1 apache-spark/ pyspark/ pyspark-dataframes

I have a dataframe that I cannot.show().我有一个我不能显示（）的数据框。 Every time it gives the following error?每次都报以下错误？ Is it possible that there is a corrupted column?是否有损坏的列？

Error:错误：

Py4JJavaError: An error occurred while calling o426.showString. Py4JJavaError：调用 o426.showString 时出错。 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 381.0 failed 4 times, most recent failure: Lost task 0.3 in stage 381.0 (TID 19204, ddlps28.rsc.dwo.com, executor 99): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/pyspark/worker.py", line 177, in main ：org.apache.spark.SparkException：作业因阶段失败而中止：阶段 381.0 中的任务 0 失败 4 次，最近的失败：阶段 381.0 中丢失任务 0.3（TID 19204，ddlps28.rsc.dwo.com，执行者 99）：org.apache.spark.api.python.PythonException：追溯（最近一次通话最后）：文件“/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/ spark2/python/pyspark/worker.py”，第 177 行，在主

1 个解决方案

Your error most likely isn't actually in the "show" operation.您的错误很可能实际上不在“显示”操作中。 It's that.show is what triggers execution of your DAG.它是 that.show 是什么触发了你的 DAG 的执行。 You said it works if you don't run your UDF, you probably just have a different error in that UDF.你说如果你不运行你的 UDF 它就可以工作，你可能只是在那个 UDF 中有一个不同的错误。 The log would probably be on the worker nodes, so try access through your Hadoop UI to get access to executor logs to see what really is breaking该日志可能位于工作节点上，因此请尝试通过您的 Hadoop UI 访问以访问执行程序日志以查看真正的问题