简体   繁体   中英

Creating Spark SQL's StructType: use add method or a constructor?

I am creating a StructType from a schema of another custom Java class, from which I can extract column name and data type.

From what I know, it seems like there is two way to construct a StructType:

  1. Use add method
  2. Use constructor passing in an array of StructField

I can basically use both methods since I loop through my custom schema class to extract field one by one. The question is, it seems like add method will create a new StructType each time it's being called, which seems unnecessarily complicated way of handling this, so I am actually wondering if it would really create a new object each time it's called. If not, I figured add is a better way than creating a new ArrayList of StructField

If you check the source code of StructType class you will see that add method invokes StructType constructor with new StructField so it will create new StructType.

def add(name: String, dataType: DataType): StructType = {
    StructType(fields :+ new StructField(name, dataType, nullable = true, Metadata.empty))
}

You can verify it using below sample program.

public class QuickTest {
public static void main(String[] args) {
    SparkSession sparkSession = SparkSession
            .builder()
            .appName("QuickTest")
            .master("local[*]")
            .getOrCreate();
    //StructType
    StructType st1 = new StructType().add("name", DataTypes.StringType);
    System.out.println("hashCode "+st1.hashCode());
    System.out.println("structType "+st1.toString());

    //add
    st1.add("age", DataTypes.IntegerType);
    System.out.println("hashCode "+st1.hashCode());
    System.out.println("structType "+st1.toString());

    //add and assign
    StructType st2 = st1.add("age", DataTypes.IntegerType);
    System.out.println("hashCode "+st2.hashCode());
    System.out.println("structType "+st2.toString());

    //constructor
    StructType st3 = new StructType(new StructField[] {new StructField("name", DataTypes.StringType, true, null), new StructField("age", DataTypes.IntegerType, true, null)});
    System.out.println("hashCode "+st3.hashCode());
    System.out.println("structType "+st3.toString());
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM