简体   繁体   中英

How to get the field names situated in a ArrayType Column

This is my Schema

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: string (nullable = true)
     |    |    |    |-- key: string (nullable = true)

I want to get the name of the elements context and key, and to change the datatype of those variables into an Array.

When I'm trying to get the fields using map, it is showing something like this.

arraydf.schema.fields.map(field1 =>
                println("FIELDS: "+field1)
Output: 
FIELDS:StructField(tags,ArrayType(ArrayType(StructType(StructField(context,StringType,true), StructField(key,StringType,true)),true),true),true)

I want my schema to be like this, the elements whichever will be under struct type should be of arrayType, I want a generic way. Please help me.

    root
     |-- tags: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- context: array (nullable = true)
     |    |    |    |-- key: array (nullable = true)

Pattern match over the structure

import org.apache.spark.sql.types._
import org.apache.spark.sql.DataFrame

def fields(df: DataFrame, c: String) = df.schema(c) match{
  case StructField(_, ArrayType(ArrayType(ss: StructType, _), _), _, _) => 
    ss.fields map { s =>
      (s.name, s.dataType)
    }
}

Example:

scala> fields(Seq(Seq(Seq((1, 2)))).toDF, "value")
res7: Array[(String, org.apache.spark.sql.types.DataType)] = Array((_1,IntegerType), (_2,IntegerType))

From what I get, you just want to access to a element right? This is done via dot notation for StructType, and getItem for ArrayType (or just square brackets []).

So, if you want to have the values, let me say, try:

arraydf.select("tags[0][0].context, tags[0][0].key")

I suggest you to look at explode() function as well, it could be useful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM