简体   繁体   中英

Iterating on org.apache.spark.sql.Row

I'm using Spark shell (1.3.1) which is a Scala shell. The simplified situation that needs iteration on Row is something like this:

import org.apache.commons.lang.StringEscapeUtils

var result = sqlContext.sql("....")
var rows = result.collect() // Array[org.apache.spark.sql.Row]
var row = rows(0) // org.apache.spark.sql.Row
var line = row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")
// error: value map is not a member of org.apache.spark.sql.Row
println(line)

My problem is that Row does not have map and - as far as I know - it cannot be converted to Array or List , so I cannot escape each cell using this style. I could write a loop using an index variable but it would be inconvenient. I would like to iterate on the cells in a situation like this:

result.collect().map(row => row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")).mkString("\n")

(These are typically not large results, they can fit into the client memory many times.)

Is there any way to iterate on the cells of a Row ? Is there any syntax for putting an index based loop at the place of row.map(...) in the last snippet?

You can use toSeq() on Row which has map. toSeq will be in the same order as the rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM