[英]Apache Spark: get elements of Row by name
In a DataFrame
object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row
objects, is there any way to extract values by name? 在Apache Spark的DataFrame
对象中(我正在使用Scala接口),如果我正在迭代其Row
对象,有没有办法按名称提取值? I can see how to do some really awkward stuff: 我可以看到如何做一些非常尴尬的事情:
def foo(r: Row) = {
val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
val field1 = r.getString(ix("field1"))
val field2 = r.getLong(ix("field2"))
...
}
dataframe.map(foo)
I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error. 我认为必须有一个更好的方法 - 这是非常冗长的,它需要创建这个额外的结构,它还需要明确地知道类型,如果不正确,将产生运行时异常而不是编译时错误。
You can use " getAs
" from org.apache.spark.sql.Row
您可以使用org.apache.spark.sql.Row
“ getAs
”
r.getAs("field1")
r.getAs("field2")
Know more about getAs(java.lang.String fieldName) 了解有关getAs的更多信息(java.lang.String fieldName)
This is not supported at this time in the Scala API. 目前在Scala API中不支持此功能。 The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs" 你最接近的是这个JIRA标题为“支持将DataFrames转换为键入的RDD”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.