簡體   English   中英

Scala Spark SQL:java.util.NoSuchElementException:找不到鍵:LINE_TYPE

[英]Scala Spark SQL : java.util.NoSuchElementException: key not found: LINE_TYPE

我有一個scala spark sql,它在spark 2.1中運行良好。 在spark 2.3上進行測試時,部分代碼因錯誤而失敗:

java.util.NoSuchElementException:找不到鍵:LINE_TYPE

其中LINE_TYPE實際上是該列之一

看到代碼,從temp 1開始.show() 7運行良好(我嘗試為每個temp_ * Df執行.show() ),但是temp 8拋出錯誤,是因為子查詢過多還是不允許自我連接2.3?

我嘗試了分區temp_Df7,這是temp 8的源

var temp_7Df = spark.sql("select * from temp_7Df").repartition($"LINE_TYPE",$"ORGN_CODE",$"BATCH_NO")
temp_7Df.createOrReplaceTempView("temp_7Df")

我仍然沒有頭緒,感謝任何人給我一些啟示

**將temp7添加到永久存儲

import org.apache.spark.storage.StorageLevel
var temp_7Rdd = temp_df
temp_7Rdd.persist(StorageLevel.MEMORY_AND_DISK)
import spark.implicits._
temp_7Rdd.toDF()
temp_7Rdd.createOrReplaceTempView("temp_7Df")

代碼(部分)

spark-shell --jars /intake/jar/PPG_HIVE_UDFS-0.0.1-jar-with-dependencies.jar --conf spark.executor.memory=15G --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.executor.instances=20 --conf spark.driver.memory=8g --conf spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf spark.network.timeout=800 --conf spark.sql.cbo.joinReorder.enabled=true --conf spark.dynamicAllocation.executorIdleTimeout=90s --conf spark.dynamicAllocation.intialExecutors=10 --conf spark.dynamicAllocation.minExecutors=15 --conf spark.dynamicAllocation.maxExecutors=40 --conf spark.shuffle.service.enabled=true --name application --queue queue

spark.conf.set("spark.eventLog.enabled", "true")
spark.conf.set("spark.io.compression.codec", "snappy")
spark.conf.set("spark.rdd.compress", "true")
spark.conf.set("spark.shuffle.service.enabled", "true")
spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
spark.conf.set("spark.sql.orc.enabled", "true")
spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
spark.conf.set("spark.sql.orc.enableVectorizedReader", "true")
spark.conf.set("spark.sql.cbo.enabled", "true")
spark.conf.set("spark.sql.shuffle.partitions", "2100")

val crc64 = spark.sql("""CREATE TEMPORARY FUNCTION CRC64 AS 'some UDF'""")

var temp_df = spark.sql("""select from source""")
temp_df.createOrReplaceTempView("temp_1Df")

var temp_df = spark.sql("""select from source""")
temp_df.createOrReplaceTempView("temp_2Df")

var temp_df = spark.sql("""select from source and temp_Df2""")
temp_df.createOrReplaceTempView("temp_3Df")

var temp_df = spark.sql("""select from source and temp_Df2, temp_Df3""")
temp_df.createOrReplaceTempView("temp_4Df")

var temp_df = spark.sql("""select from source and temp_Df1, temp_Df2, temp_Df3""")
temp_df.createOrReplaceTempView("temp_5Df")

var temp_df = spark.sql("""SELECT * FROM temp_4Df
UNION ALL
SELECT * FROM temp_5Df""")
temp_df.createOrReplaceTempView("temp_6Df")

var temp_df = spark.sql("""select from source and temp_6Df""")
temp_df.createOrReplaceTempView("temp_7Df")

var temp_df = spark.sql("""SELECT 
/*+ STREAMTABLE(AL) */ 
AL.ORGN_CODE,
AL.BATCH_NO,
AL.inventory_item_id OUTPUT_inventory_ITEM_ID,
AL.LINE_NO,
AL.LINE_TYPE,
AL.LOT_ID OUTPUT_LOT_ID,
AL.BATCH_QTY OUTPUT_ACTUAL_QTY,
AL.Batch_qty_Primary_uom OUTPUT_ACTUAL_QTY_PRIMARY_UOM
FROM temp_7Df AL JOIN
(SELECT CC.ORGN_CODE, CC.BATCH_NO, MIN(CC.LINE_NO) LINE_NO
 FROM
   (
    SELECT AA.ORGN_CODE, AA.BATCH_NO, MAX(AA.BATCH_QTY) BATCH_QTY
    FROM temp_7Df AA
    WHERE AA.LINE_TYPE=1
    GROUP BY AA.ORGN_CODE,AA.BATCH_NO
    ) BB JOIN
temp_7Df CC ON CC.ORGN_CODE=BB.ORGN_CODE
AND CC.BATCH_NO=BB.BATCH_NO
AND CC.BATCH_QTY=BB.BATCH_QTY
WHERE CC.LINE_TYPE=1
GROUP BY CC.ORGN_CODE,CC.BATCH_NO
) DD ON AL.ORGN_CODE = DD.ORGN_CODE
AND AL.BATCH_NO = DD.BATCH_NO
AND AL.LINE_NO = DD.LINE_NO
WHERE AL.LINE_TYPE = 1""")

temp_df.createOrReplaceTempView("temp_8Df")

var testresult = spark.sql("""select * from temp_8Df""")
testresult.show()

錯誤代碼(部分)

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(ITEM_ID#1364, LINE_NO#1218, BATCH_ID#1216, 2100)
+- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, actual_qty#1224, ITEM_ID#1364]
   +- *(121) BroadcastHashJoin [ITEM_UM#1222], [UM_CODE#1646], Inner, BuildRight
      :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, ITEM_ID#1364]
      :  +- *(121) BroadcastHashJoin [ITEM_UM#1370], [UM_CODE#48], Inner, BuildRight
      :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, ITEM_ID#1364, ITEM_UM#1370]
      :     :  +- *(121) BroadcastHashJoin [ITEM_ID#1364], [ITEM_ID#1630], LeftOuter, BuildRight
      :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, ITEM_ID#1364, ITEM_UM#1370]
      :     :     :  +- *(121) BroadcastHashJoin [ITEM_ID#1364], [ITEM_ID#1608], LeftOuter, BuildRight
      :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, ITEM_ID#1364, ITEM_UM#1370]
      :     :     :     :  +- *(121) BroadcastHashJoin [ITEM_ID#1364], [ITEM_ID#6], LeftOuter, BuildRight
      :     :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, ITEM_ID#1364, ITEM_UM#1370]
      :     :     :     :     :  +- *(121) BroadcastHashJoin [ORGN_CODE#1287, ITEM_ID#1364], [ORGN_CODE#1493, ITEM_ID#1554], Inner, BuildRight
      :     :     :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, LINE_TYPE#1220, item_um#1222, actual_qty#1224, orgn_code#1287, ITEM_ID#1364, ITEM_UM#1370]
      :     :     :     :     :     :  +- *(121) BroadcastHashJoin [ITEM_ID#1219], [ITEM_ID#1364], Inner, BuildRight
      :     :     :     :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, item_id#1219, LINE_TYPE#1220, item_um#1222, actual_qty#1224, orgn_code#1287]
      :     :     :     :     :     :     :  +- *(121) BroadcastHashJoin [RECIPE_ID#1286, ORGN_CODE#1287], [RECIPE_ID#1347, ORGN_CODE#1348], LeftOuter, BuildRight
      :     :     :     :     :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, item_id#1219, LINE_TYPE#1220, item_um#1222, actual_qty#1224, recipe_id#1286, orgn_code#1287]
      :     :     :     :     :     :     :     :  +- *(121) BroadcastHashJoin [RECIPE_VALIDITY_RULE_ID#2924], [RECIPE_VALIDITY_RULE_ID#1285], Inner, BuildRight
      :     :     :     :     :     :     :     :     :- *(121) Project [plant_code#2919, BATCH_NO#2920, RECIPE_VALIDITY_RULE_ID#2924, MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, item_id#1219, LINE_TYPE#1220, item_um#1222, actual_qty#1224]
      :     :     :     :     :     :     :     :     :  +- ShuffledHashJoin [BATCH_ID#2918], [BATCH_ID#1216], Inner, BuildLeft
      :     :     :     :     :     :     :     :     :     :- Exchange hashpartitioning(BATCH_ID#2918, 2100)
      :     :     :     :     :     :     :     :     :     :  +- *(110) Project [BATCH_ID#2918, plant_code#2919, BATCH_NO#2920, RECIPE_VALIDITY_RULE_ID#2924]
      :     :     :     :     :     :     :     :     :     :     +- *(110) Filter (((((batch_status#2932 IN (3,4) && (cast(date_format(cast(actual_cmplt_date#2931 as timestamp), yyyyMM, Some(Etc/UTC)) as int) >= 201904)) && isnotnull(BATCH_ID#2918)) && isnotnull(RECIPE_VALIDITY_RULE_ID#2924)) && isnotnull(BATCH_NO#2920)) && isnotnull(plant_code#2919))
      :     :     :     :     :     :     :     :     :     :        +- *(110) FileScan orc conformed_staging.cv_appoea_gme_gme_batch_header[batch_id#2918,plant_code#2919,batch_no#2920,recipe_validity_rule_id#2924,actual_cmplt_date#2931,batch_status#2932] Batched: true, Format: ORC, Location: InMemoryFileIndex[adl://ppgdadatalakeuat.azuredatalakestore.net/data/intake/conformed/staging/fin..., PartitionFilters: [], PushedFilters: [In(batch_status, [3,4]), IsNotNull(BATCH_ID), IsNotNull(RECIPE_VALIDITY_RULE_ID), IsNotNull(BATC..., ReadSchema: struct<batch_id:string,plant_code:string,batch_no:string,recipe_validity_rule_id:string,actual_cm...
      :     :     :     :     :     :     :     :     :     +- Exchange hashpartitioning(BATCH_ID#1216, 2100)
      :     :     :     :     :     :     :     :     :        +- *(111) Project [MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, item_id#1219, LINE_TYPE#1220, item_um#1222, actual_qty#1224]
      :     :     :     :     :     :     :     :     :           +- *(111) Filter (((((((isnotnull(CONTRIBUTE_YIELD_IND#1273) && isnotnull(LINE_TYPE#1220)) && (CONTRIBUTE_YIELD_IND#1273 = Y)) && (cast(LINE_TYPE#1220 as int) = 1)) && isnotnull(BATCH_ID#1216)) && isnotnull(ITEM_ID#1219)) && isnotnull(ITEM_UM#1222)) && isnotnull(ACTUAL_QTY#1224))
      :     :     :     :     :     :     :     :     :              +- *(111) FileScan orc conformed_staging.cv_appoea_gme_gme_material_details[material_detail_id#1215,batch_id#1216,line_no#1218,item_id#1219,line_type#1220,item_um#1222,actual_qty#1224,contribute_yield_ind#1273] Batched: true, Format: ORC, Location: InMemoryFileIndex[adl://ppgdadatalakeuat.azuredatalakestore.net/data/intake/conformed/staging/fin..., PartitionFilters: [], PushedFilters: [IsNotNull(contribute_yield_ind), IsNotNull(LINE_TYPE), EqualTo(contribute_yield_ind,Y), IsNotNul..., ReadSchema: struct<material_detail_id:string,batch_id:string,line_no:string,item_id:string,line_type:string,i...
      :     :     :     :     :     :     :     :     +- ReusedExchange [recipe_validity_rule_id#1285, recipe_id#1286, orgn_code#1287], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      :     :     :     :     :     :     :     +- ReusedExchange [recipe_id#1347, orgn_code#1348], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true]))
      :     :     :     :     :     :     +- ReusedExchange [ITEM_ID#1364, ITEM_UM#1370], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      :     :     :     :     :     +- ReusedExchange [orgn_code#1493, item_id#1554], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true]))
      :     :     :     :     +- ReusedExchange [ITEM_ID#6], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      :     :     :     +- ReusedExchange [item_id#1608], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      :     :     +- ReusedExchange [item_id#1630], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      :     +- ReusedExchange [UM_CODE#48], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
      +- ReusedExchange [UM_CODE#1646], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))

  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
  at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doExecute(SortMergeJoinExec.scala:150)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:557)
  at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:557)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:557)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 228 more
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(BATCH_ID#1216, 2100)
+- *(111) Project [MATERIAL_DETAIL_ID#1215, batch_id#1216, LINE_NO#1218, item_id#1219, LINE_TYPE#1220, item_um#1222, actual_qty#1224]
   +- *(111) Filter (((((((isnotnull(CONTRIBUTE_YIELD_IND#1273) && isnotnull(LINE_TYPE#1220)) && (CONTRIBUTE_YIELD_IND#1273 = Y)) && (cast(LINE_TYPE#1220 as int) = 1)) && isnotnull(BATCH_ID#1216)) && isnotnull(ITEM_ID#1219)) && isnotnull(ITEM_UM#1222)) && isnotnull(ACTUAL_QTY#1224))
      +- *(111) FileScan orc conformed_staging.cv_appoea_gme_gme_material_details[material_detail_id#1215,batch_id#1216,line_no#1218,item_id#1219,line_type#1220,item_um#1222,actual_qty#1224,contribute_yield_ind#1273] Batched: true, Format: ORC, Location: InMemoryFileIndex[adl://ppgdadatalakeuat.azuredatalakestore.net/data/intake/conformed/staging/fin..., PartitionFilters: [], PushedFilters: [IsNotNull(contribute_yield_ind), IsNotNull(LINE_TYPE), EqualTo(contribute_yield_ind,Y), IsNotNul..., ReadSchema: struct<material_detail_id:string,batch_id:string,line_no:string,item_id:string,line_type:string,i...

  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.doExecute(ShuffledHashJoinExec.scala:67)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 278 more
Caused by: java.util.NoSuchElementException: key not found: LINE_TYPE
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)
  at scala.collection.MapLike$class.apply(MapLike.scala:141)
  at scala.collection.AbstractMap.apply(Map.scala:59)
  at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.org$apache$spark$sql$execution$datasources$orc$OrcFilters$$buildSearchArgument(OrcFilters.scala:198)
  at org.apache.spark.sql.execution.datasources.orc.OrcFilters$$anonfun$2.apply(OrcFilters.scala:69)
  at org.apache.spark.sql.execution.datasources.orc.OrcFilters$$anonfun$2.apply(OrcFilters.scala:68)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:68)
  at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:145)
  at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297)
  at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295)
  at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315)
  at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
  at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
  at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 323 more

該錯誤已經存在,temp_8Df引發錯誤,因為這是您觸發動作“ show()”的地方

在此之前,火花只是為延遲執行而形成DAG。

只有收到動作后,實際執行才開始,並引發錯誤。 因此,嘗試在較早的步驟中進行表演,您可以對其進行調試。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM