简体   繁体   中英

Scala Spark - how to iterate fields in a Dataframe

My Dataframe has several columns with different types (string, double, Map, array, etc).

I need to perform some operation in certain column types and I am looking for a nice way to identify the field type and then do the proper action

types: String|Double|Map<String,Int>|...

|---------------------------------------------------------------
|myString1 |myDouble1|     myMap1                   | ...otherTypes                          
|---------------------------------------------------------------
|"string_1"|  123.0  |{"str1Map":1,"str2":2, "str31inmap": 31} |...
|"string_2"|  456.0  |{"str2Map":2,"str22":2, "str32inmap": 32}|...
|"string_3"|  789.0  |{"str3Map":3,"str23":2, "str33inmap": 33}|...
|---------------------------------------------------------------

Iterating the dataframe fields and printing: df.schema.fields.foreach { println }

outputs:

StructField(myString1,StringType,true)
StructField(myDouble1,DoubleType,false)
StructField(myMap1,MapType(StringType,IntType,false),true)
...
StructField(myStringList,ArrayType(StringType,true),true)

So, my idea is to iterate through the fields and in case is one of the types that I need to perform an operation (eg on the Map type), then I know the field name/column and action to take.

 df.schema.fields.foreach { f =>
     val fName = ?get the name
     val fType = ?get the Type
     print("Name{} Type:{}".format(fName , fType))

      // case type is Map do action X
      // case type is Stringdo action Y
      // ...

    }

Does this approach makes sense to detect the field types on my dataframe and then perform different on the df fields, depending on their type? How to get it to work?

Note that print format in scala needs the %s, in python you can use {}

This should work:

 df.dtypes.foreach {  f =>
      val fName = f._1
      val fType = f._2
      if (fType  == "StringType") { println(s"STRING_TYPE") }
      if (fType  == "MapType") { println(s"MAP_TYPE") }
      //else {println("....")}
      println("Name %s Type:%s - all:%s".format(fName , fType, f))

    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM