如何使用火花在表中插入几列

Question

I am getting some fields from Multiple tables using joins using a select statement(5 fields).我使用 select 语句（5 个字段）从多个表中获取一些字段。

I have to insert these values in another Table Table-B which is having more columns(10 columns).我必须将这些值插入另一个具有更多列（10 列）的表Table-B中。

How to insert these values in the Table_B like Col1 to id column and Col2 in Alias and Col3 in emp_age and Col4 in occupation.如何将这些值插入Table_B中，例如 Col1 到 id 列，Col2 在 Alias 中，Col3 在 emp_age 中，Col4 在职业中。

I am getting the first result of multiple joins in a dataset.我得到了数据集中多个连接的第一个结果。

Dataset exlCompaniesDataset = sparkSession.sql("Select query with multiple inner joins"); Dataset exlCompaniesDataset = sparkSession.sql("选择具有多个内连接的查询");

How to get each column values from Dataset and insert it in Table-B ?如何从 Dataset 中获取每列值并将其插入Table-B ？

Answer 1

You essentially need to solve 2 issues -您基本上需要解决 2 个问题 -

Make the initial dataframe to have the same number of columns as your target使初始 dataframe 具有与目标相同的列数
Rename the columns as needed from source to target.根据需要将列从源重命名为目标。

Below is a sample code to do this.下面是执行此操作的示例代码。 I have shown the spark-shell output at places to make it easier to understand.我在一些地方展示了 spark-shell output 以便于理解。

import scala.util.Try
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

//Input Data
val inputDF = spark.sql("SELECT '1' as Col1,'name' as Col2,'age' as Col3,'job' as Col4")
scala> inputDF.show(false)
+----+----+----+----+
|Col1|Col2|Col3|Col4|
+----+----+----+----+
|1   |name|age |job |
+----+----+----+----+

//make same number of columns as needed in target
val inputColList = List("Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9","Col10")

var newDf = inputDF
//Loop and add any missing columns as Null
inputColList.foreach( fieldName => {
  if(!Try(newDf(fieldName)).isSuccess)
  newDf = newDf.withColumn(fieldName, lit(null).cast(StringType))
})

scala> newDf.show(false)
+----+----+----+----+----+----+----+----+----+-----+
|Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|
+----+----+----+----+----+----+----+----+----+-----+
|1   |name|age |job |null|null|null|null|null|null |
+----+----+----+----+----+----+----+----+----+-----+

//Create a Source to Target Map.
val srcTgtMap = Map ("Col1"->"id","Col2"->"Alias","Col3"->"ten_id","Col4"->"occupation","Col5"->"emp__age","Col6"->"Salary","Col7"->"created_date","Col8"->"Modified_date","Col9"->"some_col1","Col10"->"col_col2" )

//Iterate to get new column names.

val colMappingList = srcTgtMap.keys.map(key => col(key).as(srcTgtMap(key))).toList
val dfRenamed = newDf.select(colMappingList: _*)

scala> dfRenamed.show(false)
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|col_col2|emp__age|ten_id|Modified_date|Salary|created_date|occupation|some_col1|id |Alias|
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|null    |null    |age   |null         |null  |null        |job       |null     |1  |name |
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+

Hope that helps!!希望有帮助！！

如何使用火花在表中插入几列

问题描述

1 个解决方案

解决方案1
0 2022-09-15 18:30:28

如何使用火花在表中插入几列

问题描述

1 个解决方案

解决方案1 0 2022-09-15 18:30:28

解决方案1
0 2022-09-15 18:30:28