Iterating on org.apache.spark.sql.Row

Question

I'm using Spark shell (1.3.1) which is a Scala shell. The simplified situation that needs iteration on Row is something like this:

import org.apache.commons.lang.StringEscapeUtils

var result = sqlContext.sql("....")
var rows = result.collect() // Array[org.apache.spark.sql.Row]
var row = rows(0) // org.apache.spark.sql.Row
var line = row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")
// error: value map is not a member of org.apache.spark.sql.Row
println(line)

My problem is that Row does not have map and - as far as I know - it cannot be converted to Array or List , so I cannot escape each cell using this style. I could write a loop using an index variable but it would be inconvenient. I would like to iterate on the cells in a situation like this:

result.collect().map(row => row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")).mkString("\n")

(These are typically not large results, they can fit into the client memory many times.)

Is there any way to iterate on the cells of a Row ? Is there any syntax for putting an index based loop at the place of row.map(...) in the last snippet?

Answer 1

You can use toSeq() on Row which has map. toSeq will be in the same order as the rows

Iterating on org.apache.spark.sql.Row

Question

1 answers

solution1
1 2015-05-20 20:45:22

Iterating on org.apache.spark.sql.Row

Question

1 answers

solution1 1 2015-05-20 20:45:22

solution1
1 2015-05-20 20:45:22