简体   繁体   中英

Spark - copy a field using df.schema.copy functions for another dataframe

I need to create a schema using existing df field.

Consider this example dataframe

scala> case class prd (a:Int, b:Int)
defined class prd

scala> val df = Seq((Array(prd(10,20),prd(15,30),prd(20,25)))).toDF("items")
df: org.apache.spark.sql.DataFrame = [items: array<struct<a:int,b:int>>]

scala> df.printSchema
root
 |-- items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: integer (nullable = false)
 |    |    |-- b: integer (nullable = false)

I need one more field "items_day1" similar to "items" for df2. Right now, I'm doing it like below which is a workaround

scala> val df2=df.select('items,'items.as("item_day1"))
df2: org.apache.spark.sql.DataFrame = [items: array<struct<a:int,b:int>>, item_day1: array<struct<a:int,b:int>>]

scala> df2.printSchema
root
 |-- items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: integer (nullable = false)
 |    |    |-- b: integer (nullable = false)
 |-- item_day1: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: integer (nullable = false)
 |    |    |-- b: integer (nullable = false)


scala>

But how to get that using the df.schema.add() or df.schema.copy() functions?.

EDIT1:

I'm trying like below

val (a,b) = (df.schema,df.schema) // works
a("items")  //works
b.add(a("items").as("items_day1")) //Error.. 

To add a new field to your DataFrame schema (which is of StructType ) with the same structure but a different top-level name of the existing field, you can copy the StructField with a modified StructField member name , as shown below:

import org.apache.spark.sql.types._

case class prd (a:Int, b:Int)

val df = Seq((Array(prd(10,20), prd(15,30), prd(20,25)))).toDF("items")

val schema = df.schema
// schema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(items, ArrayType(
//     StructType(StructField(a,IntegerType,false), StructField(b,IntegerType,false)
//   ), true), true)
// )

val newSchema = schema.find(_.name == "items") match {
  case Some(field) => schema.add(field.copy(name = "items_day1"))
  case None        => schema
}
// newSchema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(items, ArrayType(
//     StructType(StructField(a,IntegerType,false), StructField(b,IntegerType,false)
//   ), true), true),
//   StructField(items_day1, ArrayType(
//     StructType(StructField(a,IntegerType,false), StructField(b,IntegerType,false)
//   ), true), true)
// )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM