简体   繁体   English

创建Spark SQL的StructType:使用add方法还是构造函数?

[英]Creating Spark SQL's StructType: use add method or a constructor?

I am creating a StructType from a schema of another custom Java class, from which I can extract column name and data type. 我正在从另一个自定义Java类的模式创建一个StructType ,我可以从中提取列名和数据类型。

From what I know, it seems like there is two way to construct a StructType: 据我所知,似乎有两种方法来构造StructType:

  1. Use add method 使用add方法
  2. Use constructor passing in an array of StructField 使用构造函数传递StructField数组

I can basically use both methods since I loop through my custom schema class to extract field one by one. 我基本上可以使用这两种方法,因为我遍历我的自定义模式类来逐个提取字段。 The question is, it seems like add method will create a new StructType each time it's being called, which seems unnecessarily complicated way of handling this, so I am actually wondering if it would really create a new object each time it's called. 问题是,似乎add方法每次调用时都会创建一个新的StructType,这似乎是不必要的复杂处理方式,所以我实际上想知道每次调用它是否真的会创建一个新对象。 If not, I figured add is a better way than creating a new ArrayList of StructField 如果没有,我认为add是比创建StructField的新ArrayList更好的方法

If you check the source code of StructType class you will see that add method invokes StructType constructor with new StructField so it will create new StructType. 如果检查StructType类的源代码,您将看到add方法使用new StructField调用StructType构造函数,因此它将创建新的StructType。

def add(name: String, dataType: DataType): StructType = {
    StructType(fields :+ new StructField(name, dataType, nullable = true, Metadata.empty))
}

You can verify it using below sample program. 您可以使用以下示例程序进行验证。

public class QuickTest {
public static void main(String[] args) {
    SparkSession sparkSession = SparkSession
            .builder()
            .appName("QuickTest")
            .master("local[*]")
            .getOrCreate();
    //StructType
    StructType st1 = new StructType().add("name", DataTypes.StringType);
    System.out.println("hashCode "+st1.hashCode());
    System.out.println("structType "+st1.toString());

    //add
    st1.add("age", DataTypes.IntegerType);
    System.out.println("hashCode "+st1.hashCode());
    System.out.println("structType "+st1.toString());

    //add and assign
    StructType st2 = st1.add("age", DataTypes.IntegerType);
    System.out.println("hashCode "+st2.hashCode());
    System.out.println("structType "+st2.toString());

    //constructor
    StructType st3 = new StructType(new StructField[] {new StructField("name", DataTypes.StringType, true, null), new StructField("age", DataTypes.IntegerType, true, null)});
    System.out.println("hashCode "+st3.hashCode());
    System.out.println("structType "+st3.toString());
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM