简体   繁体   English

无法将Spark JSON数据框加载到配置单元表中

[英]Cannot load spark json dataframe into a hive table

I want to convert a dataframe into a json object and load it into a json table. 我想将数据框转换为json对象并将其加载到json表中。

Below is the code 下面是代码

Create the table 创建表

spark.sql("""create table IF NOT EXISTS user_tech.tests (
Z struct<A:string, 
 B:string,
 C:string>
)
stored as orc """)

import org.apache.spark.sql._

Initial data frame 初始数据帧

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    


val jsonColumns = df.select("A", "B", "C")

Converting it into json 将其转换为json

import org.apache.spark.sql.functions._
val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")

Insert the rows into the table 将行插入表中

finalDF.registerTempTable("test")

spark.sql(""" select * from test """).show()

spark.sql("""Insert into  user_tech.tests select * from test""")

I am getting the following error: 我收到以下错误:

org.apache.spark.sql.AnalysisException: cannot resolve 'test.`structstojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))`' due to data type mismatch: cannot cast StringType to StructType(StructField(guid,StringType,true), StructField(sessionid,StringType,true));;

The problem is with the following statement. 问题在于以下语句。

val finalDF = jsonColumns.select(to_json(struct(col("A"), col("B"), col("C")))).as("Z")

A quick validation of the above DataFrame will make you understand that you are creating a single column of String type. 快速验证上述DataFrame将使您了解您正在创建一个String类型的列。

scala> finalDF.show
+--------------------------------------------------------------------------------------------+
|structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C))|
+--------------------------------------------------------------------------------------------+
|                                                                         {"A":1,"B":2,"C":3}|
|                                                                         {"A":2,"B":3,"C":4}|
+--------------------------------------------------------------------------------------------+


scala> finalDF.printSchema
root
 |-- structtojson(named_struct(NamePlaceholder(), A, NamePlaceholder(), B, NamePlaceholder(), C)): string (nullable = true)

And when you try to insert from Temp table registered on finalDF, the schema doesn't match and you got the exception. 而且,当您尝试从在finalDF上注册的Temp表中插入时,该架构不匹配,并且出现了异常。

Following should work for you. 以下应该为您工作。

spark.sql("""create table IF NOT EXISTS tests (
Z struct<A:string, 
 B:string,
 C:string>
)
stored as orc """)


import org.apache.spark.sql._

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    

val jsonColumns = df.select("A", "B", "C")

jsonColumns.registerTempTable("tmp")

spark.sql("""Insert into tests select struct(*) from tmp""")

You can see the data using the following statement. 您可以使用以下语句查看数据。

spark.sql("select * from tests").show


+-------+
|      Z|
+-------+
|[1,2,3]|
|[1,2,3]|
|[2,3,4]|
|[2,3,4]|
+-------+

Hope that helps! 希望有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM