简体   繁体   English

Apache Spark:按名称获取Row的元素

[英]Apache Spark: get elements of Row by name

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract values by name? 在Apache Spark的DataFrame对象中(我正在使用Scala接口),如果我正在迭代其Row对象,有没有办法按名称提取值? I can see how to do some really awkward stuff: 我可以看到如何做一些非常尴尬的事情:

def foo(r: Row) = {
  val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
  val field1 = r.getString(ix("field1"))
  val field2 = r.getLong(ix("field2"))
  ...
}
dataframe.map(foo)

I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error. 我认为必须有一个更好的方法 - 这是非常冗长的,它需要创建这个额外的结构,它还需要明确地知道类型,如果不正确,将产生运行时异常而不是编译时错误。

You can use " getAs " from org.apache.spark.sql.Row 您可以使用org.apache.spark.sql.RowgetAs

r.getAs("field1")
r.getAs("field2")

Know more about getAs(java.lang.String fieldName) 了解有关getAs的更多信息(java.lang.String fieldName)

This is not supported at this time in the Scala API. 目前在Scala API中不支持此功能。 The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs" 你最接近的是这个JIRA标题为“支持将DataFrames转换为键入的RDD”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM