简体   繁体   English

如何获取位于ArrayType列中的字段名称

[英]How to get the field names situated in a ArrayType Column

This is my Schema 这是我的架构

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: string (nullable = true)
     |    |    |    |-- key: string (nullable = true)

I want to get the name of the elements context and key, and to change the datatype of those variables into an Array. 我想获取元素上下文和键的名称,并将这些变量的数据类型更改为数组。

When I'm trying to get the fields using map, it is showing something like this. 当我尝试使用map获取字段时,它显示的是这样的内容。

arraydf.schema.fields.map(field1 =>
                println("FIELDS: "+field1)
Output: 
FIELDS:StructField(tags,ArrayType(ArrayType(StructType(StructField(context,StringType,true), StructField(key,StringType,true)),true),true),true)

I want my schema to be like this, the elements whichever will be under struct type should be of arrayType, I want a generic way. 我希望我的架构是这样的,无论结构类型下的元素应该是arrayType,我想要一种通用的方式。 Please help me. 请帮我。

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: array (nullable = true)
     |    |    |    |-- key: array (nullable = true)

Pattern match over the structure 结构上的图案匹配

import org.apache.spark.sql.types._
import org.apache.spark.sql.DataFrame

def fields(df: DataFrame, c: String) = df.schema(c) match{
  case StructField(_, ArrayType(ArrayType(ss: StructType, _), _), _, _) => 
    ss.fields map { s =>
      (s.name, s.dataType)
    }
}

Example: 例:

scala> fields(Seq(Seq(Seq((1, 2)))).toDF, "value")
res7: Array[(String, org.apache.spark.sql.types.DataType)] = Array((_1,IntegerType), (_2,IntegerType))

From what I get, you just want to access to a element right? 从我得到的结果中,您只想访问一个元素对吗? This is done via dot notation for StructType, and getItem for ArrayType (or just square brackets []). 这通过StructType的点符号和ArrayType的getItem(或仅使用方括号[])来完成。

So, if you want to have the values, let me say, try: 因此,如果您想拥有这些值,请说一下,尝试:

arraydf.select("tags[0][0].context, tags[0][0].key")

I suggest you to look at explode() function as well, it could be useful. 我建议您也查看explode()函数,这可能会很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM