Scala Spark行级错误处理

Question

I'm having some trouble figuring out how to do some row-level error handling with a Scala Spark program. 我在查找如何使用Scala Spark程序进行行级错误处理时遇到了一些麻烦。 In the code below, I'm reading in a CSV text file, parsing it, and creating a Row using a mapSchema method (not shown; basically, it takes the Array of strings that result from the CSV and uses a schema to convert the strings into ints, doubles, dates, etc.). 在下面的代码中，我正在读取CSV文本文件，解析它，并使用mapSchema方法创建一个Row（未显示;基本上，它接受由CSV产生的字符串数组并使用模式转换字符串分为整数，双数，日期等）。 It works great when the data is all formatted appropriately. 当数据格式正确时，它可以很好地工作。 However, if I have a bad row -- for example, one with fewer fields than expected -- I want to perform some error handling. 但是，如果我有一个错误的行 - 例如，一个字段比预期的少 - 我想执行一些错误处理。

val rddFull = sqlContext.sparkContext.textFile(csvPath).map {
  case(txt) =>
    try {
      val reader = new CSVReader(new StringReader(txt), delimiter, quote, escape, headerLines)
      val parsedRow = reader.readNext()
      Row(mapSchema(parsedRow, schema) : _*)
    } catch {
      case err: Throwable =>
        println("a record had an error: "+ txt)
        throw new RuntimeException("SomeError")
    }

The problem is that the try/catch expressions don't seem to be working. 问题是try / catch表达式似乎不起作用。 When I give it bad row, I don't ever get the "SomeError" RuntimeException. 当我给它错误的行时，我不会得到“SomeError”RuntimeException。 Instead, I get the same error that I get when I don't use try/catch. 相反，当我不使用try / catch时，我得到了同样的错误。

Any ideas about what could be going wrong here? 关于这里可能出现什么问题的任何想法？

Answer 1

You need to look in the correct place for the logs . 您需要查看日志的正确位置。 To start with: the catch does work. 首先：捕获确实有效。 Here is an example from the spark-shell: 这是spark-shell的一个例子：

val d = sc.parallelize(0 until 10)
val e = d.map{ n =>
  try {
   if (n % 3==0) throw new IllegalArgumentException("That was a bad call")
   println(n)
 } catch {
    case e:  IllegalArgumentException =>  throw new UnsupportedOperationException("converted from Arg to Op except")
 }
}
e.collect

Here is the result: notice the exception were properly caught and converted: 结果如下：注意异常被正确捕获并转换：

org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in
stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in   
stage 0.0 (TID 5, localhost): 
java.lang.UnsupportedOperationException: converted from Arg to Op except
    at $anonfun$1.apply$mcVI$sp(<console>:29)
    at $anonfun$1.apply(<console>:24)
    at $anonfun$1.apply(<console>:24)

Try looking in the stderr logs of one or more of the workers. 尝试查看一个或多个工作人员的stderr日志。

Scala Spark行级错误处理

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-08-08 18:35:11

Scala Spark行级错误处理

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-08-08 18:35:11

解决方案1
4 已采纳 2015-08-08 18:35:11