简体   繁体   English

Scala Spark行级错误处理

[英]Scala Spark row-level error handling

I'm having some trouble figuring out how to do some row-level error handling with a Scala Spark program. 我在查找如何使用Scala Spark程序进行行级错误处理时遇到了一些麻烦。 In the code below, I'm reading in a CSV text file, parsing it, and creating a Row using a mapSchema method (not shown; basically, it takes the Array of strings that result from the CSV and uses a schema to convert the strings into ints, doubles, dates, etc.). 在下面的代码中,我正在读取CSV文本文件,解析它,并使用mapSchema方法创建一个Row(未显示;基本上,它接受由CSV产生的字符串数组并使用模式转换字符串分为整数,双数,日期等)。 It works great when the data is all formatted appropriately. 当数据格式正确时,它可以很好地工作。 However, if I have a bad row -- for example, one with fewer fields than expected -- I want to perform some error handling. 但是,如果我有一个错误的行 - 例如,一个字段比预期的少 - 我想执行一些错误处理。

val rddFull = sqlContext.sparkContext.textFile(csvPath).map {
  case(txt) =>
    try {
      val reader = new CSVReader(new StringReader(txt), delimiter, quote, escape, headerLines)
      val parsedRow = reader.readNext()
      Row(mapSchema(parsedRow, schema) : _*)
    } catch {
      case err: Throwable =>
        println("a record had an error: "+ txt)
        throw new RuntimeException("SomeError")
    }

The problem is that the try/catch expressions don't seem to be working. 问题是try / catch表达式似乎不起作用。 When I give it bad row, I don't ever get the "SomeError" RuntimeException. 当我给它错误的行时,我不会得到“SomeError”RuntimeException。 Instead, I get the same error that I get when I don't use try/catch. 相反,当我不使用try / catch时,我得到了同样的错误。

Any ideas about what could be going wrong here? 关于这里可能出现什么问题的任何想法?

You need to look in the correct place for the logs . 您需要查看日志的正确位置。 To start with: the catch does work. 首先:捕获确实有效。 Here is an example from the spark-shell: 这是spark-shell的一个例子:

val d = sc.parallelize(0 until 10)
val e = d.map{ n =>
  try {
   if (n % 3==0) throw new IllegalArgumentException("That was a bad call")
   println(n)
 } catch {
    case e:  IllegalArgumentException =>  throw new UnsupportedOperationException("converted from Arg to Op except")
 }
}
e.collect

Here is the result: notice the exception were properly caught and converted: 结果如下:注意异常被正确捕获并转换:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in
stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in   
stage 0.0 (TID 5, localhost): 
java.lang.UnsupportedOperationException: converted from Arg to Op except
    at $anonfun$1.apply$mcVI$sp(<console>:29)
    at $anonfun$1.apply(<console>:24)
    at $anonfun$1.apply(<console>:24)

Try looking in the stderr logs of one or more of the workers. 尝试查看一个或多个工作人员的stderr日志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM