简体   繁体   中英

Spark SQL table max column count

I am trying to create spark SQL table by creating RDD in Scala program with column count of 200+. The compilation (sbt compile) fails with java.lang.StackOverflowError exception when i create my schema as:

StructField("RT", StringType,nullable = true) ::
StructField("SERIALNO", StringType,nullable = true) ::
StructField("SPORDER", StringType,nullable = true) ::
// ... remaining 200+ columns

Can't paste the stacktrace as it is more than 1.5k lines

On reducing the column count to around 100-120 compilation succeeds. Also, when i create a schema using schema string (splitting schema string and then creating map of it), compilation succeeds (First example under heading " Programmatically Specifying the Schema " in https://spark.apache.org/docs/1.3.0/sql-programming-guide.html ).

What seems to be problem with manually specifying schema which results in exception?

The basic issue here is that you are doing a list concatenation at each step for each StructField. The operator :: is actually a member of List not StructField. While the code reads:

val fields = field1 :: field2 :: field3 :: Nil

This is equivalent to:

val fields = field1 :: (field2 :: (field3 :: Nil))

or even

val fields = Nil.::(field1).::(field2).::(field3)

So, on execution, the JVM needs to recursively evaluate the calls to the :: method. The JVM is increasing the depth of the stack in proportion to the number of items in the list. The reason that splitting a string of field names and mapping works is because it iterates through the split string of field names rather than using recursion.

This is not a Spark issue. You can reproduce this same stack overflow error on a series of List concatenations of any type in the Scala repl once you get into the hundreds of items. Just use one of the other approaches to creating your list of StructFields that doesn't cause a stack overflow.

For example, something like this will work just fine:

val structure = StructType(
  List(
    StructField("RT", StringType,nullable = true),
    StructField("SERIALNO", StringType,nullable = true),
    StructField("SPORDER", StringType,nullable = true),
    // Other Fields
    StructField("LASTFIELD", StringType,nullable = true)
  )
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM