[英]How to insert few columns in a Table using spark
I am getting some fields from Multiple tables using joins using a select statement(5 fields).我使用 select 语句(5 个字段)从多个表中获取一些字段。
I have to insert these values in another Table Table-B which is having more columns(10 columns).我必须将这些值插入另一个具有更多列(10 列)的表Table-B中。
How to insert these values in the Table_B like Col1 to id column and Col2 in Alias and Col3 in emp_age and Col4 in occupation.如何将这些值插入Table_B中,例如 Col1 到 id 列,Col2 在 Alias 中,Col3 在 emp_age 中,Col4 在职业中。
I am getting the first result of multiple joins in a dataset.我得到了数据集中多个连接的第一个结果。
Dataset exlCompaniesDataset = sparkSession.sql("Select query with multiple inner joins"); Dataset exlCompaniesDataset = sparkSession.sql("选择具有多个内连接的查询");
How to get each column values from Dataset and insert it in Table-B ?如何从 Dataset 中获取每列值并将其插入Table-B ?
You essentially need to solve 2 issues -您基本上需要解决 2 个问题 -
Below is a sample code to do this.下面是执行此操作的示例代码。 I have shown the spark-shell output at places to make it easier to understand.
我在一些地方展示了 spark-shell output 以便于理解。
import scala.util.Try
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
//Input Data
val inputDF = spark.sql("SELECT '1' as Col1,'name' as Col2,'age' as Col3,'job' as Col4")
scala> inputDF.show(false)
+----+----+----+----+
|Col1|Col2|Col3|Col4|
+----+----+----+----+
|1 |name|age |job |
+----+----+----+----+
//make same number of columns as needed in target
val inputColList = List("Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9","Col10")
var newDf = inputDF
//Loop and add any missing columns as Null
inputColList.foreach( fieldName => {
if(!Try(newDf(fieldName)).isSuccess)
newDf = newDf.withColumn(fieldName, lit(null).cast(StringType))
})
scala> newDf.show(false)
+----+----+----+----+----+----+----+----+----+-----+
|Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|
+----+----+----+----+----+----+----+----+----+-----+
|1 |name|age |job |null|null|null|null|null|null |
+----+----+----+----+----+----+----+----+----+-----+
//Create a Source to Target Map.
val srcTgtMap = Map ("Col1"->"id","Col2"->"Alias","Col3"->"ten_id","Col4"->"occupation","Col5"->"emp__age","Col6"->"Salary","Col7"->"created_date","Col8"->"Modified_date","Col9"->"some_col1","Col10"->"col_col2" )
//Iterate to get new column names.
val colMappingList = srcTgtMap.keys.map(key => col(key).as(srcTgtMap(key))).toList
val dfRenamed = newDf.select(colMappingList: _*)
scala> dfRenamed.show(false)
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|col_col2|emp__age|ten_id|Modified_date|Salary|created_date|occupation|some_col1|id |Alias|
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|null |null |age |null |null |null |job |null |1 |name |
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
Hope that helps!!希望有帮助!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.