简体   繁体   English

如何使用火花在表中插入几列

[英]How to insert few columns in a Table using spark

I am getting some fields from Multiple tables using joins using a select statement(5 fields).我使用 select 语句(5 个字段)从多个表中获取一些字段。 在此处输入图像描述

I have to insert these values in another Table Table-B which is having more columns(10 columns).我必须将这些值插入另一个具有更多列(10 列)的表Table-B中。 在此处输入图像描述

How to insert these values in the Table_B like Col1 to id column and Col2 in Alias and Col3 in emp_age and Col4 in occupation.如何将这些值插入Table_B中,例如 Col1 到 id 列,Col2 在 Alias 中,Col3 在 emp_age 中,Col4 在职业中。

I am getting the first result of multiple joins in a dataset.我得到了数据集中多个连接的第一个结果。

Dataset exlCompaniesDataset = sparkSession.sql("Select query with multiple inner joins"); Dataset exlCompaniesDataset = sparkSession.sql("选择具有多个内连接的查询");

How to get each column values from Dataset and insert it in Table-B ?如何从 Dataset 中获取每列值并将其插入Table-B

You essentially need to solve 2 issues -您基本上需要解决 2 个问题 -

  1. Make the initial dataframe to have the same number of columns as your target使初始 dataframe 具有与目标相同的列数
  2. Rename the columns as needed from source to target.根据需要将列从源重命名为目标。

Below is a sample code to do this.下面是执行此操作的示例代码。 I have shown the spark-shell output at places to make it easier to understand.我在一些地方展示了 spark-shell output 以便于理解。

import scala.util.Try
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

//Input Data
val inputDF = spark.sql("SELECT '1' as Col1,'name' as Col2,'age' as Col3,'job' as Col4")
scala> inputDF.show(false)
+----+----+----+----+
|Col1|Col2|Col3|Col4|
+----+----+----+----+
|1   |name|age |job |
+----+----+----+----+

//make same number of columns as needed in target
val inputColList = List("Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9","Col10")

var newDf = inputDF
//Loop and add any missing columns as Null
inputColList.foreach( fieldName => {
  if(!Try(newDf(fieldName)).isSuccess)
  newDf = newDf.withColumn(fieldName, lit(null).cast(StringType))
})

scala> newDf.show(false)
+----+----+----+----+----+----+----+----+----+-----+
|Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|
+----+----+----+----+----+----+----+----+----+-----+
|1   |name|age |job |null|null|null|null|null|null |
+----+----+----+----+----+----+----+----+----+-----+

//Create a Source to Target Map.
val srcTgtMap = Map ("Col1"->"id","Col2"->"Alias","Col3"->"ten_id","Col4"->"occupation","Col5"->"emp__age","Col6"->"Salary","Col7"->"created_date","Col8"->"Modified_date","Col9"->"some_col1","Col10"->"col_col2" )

//Iterate to get new column names.

val colMappingList = srcTgtMap.keys.map(key => col(key).as(srcTgtMap(key))).toList
val dfRenamed = newDf.select(colMappingList: _*)

scala> dfRenamed.show(false)
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|col_col2|emp__age|ten_id|Modified_date|Salary|created_date|occupation|some_col1|id |Alias|
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+
|null    |null    |age   |null         |null  |null        |job       |null     |1  |name |
+--------+--------+------+-------------+------+------------+----------+---------+---+-----+

Hope that helps!!希望有帮助!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Java中的spark仅将几列保存到cassandra - How to save only few columns to cassandra using spark In Java 使用 Spark / java 将 dataframe 中的记录插入 MySQL 表 - Insert record from dataframe into MySQL table using Spark / java 如何使用JDBC插入表中的数组 - how to insert array in table using JDBC insert 如何从 Spark 表中的所有列中消除元数据? (爪哇) - How to eliminate Metadata from all columns in a Spark Table? (Java) 我们可以使用hibernate只使用java pojo映射几个表列 - Can we map only a few table columns with java pojo using hibernate 使用ibatis将字符串列表插入表中的多列中 - insert a list of strings to a table in multiple columns using ibatis 如何使用java apache poi从Excel工作表中的仅少数列中删除过滤器,而不是所有列中删除过滤器 - how to delete filters from only few columns in the excel sheet not all of the columns using java apache poi 如何仅从给定行的表列中插入数据? - how to insert data from table columns of the given row only? 如何在具有动态列的表中插入值Jdbc / Mysql - How to insert values in a table with dynamic columns Jdbc/Mysql 如何使用 java 连接火花 dataframe 中的所有列? - how to concat all columns in a spark dataframe, using java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM