[英]Save table in hive with java spark sql from json array
Dataset<Row> ds = spark.read().option("multiLine", true).option("mode", "PERMISSIVE").json("/user/administrador/prueba_diario.txt").toDF();
ds.printSchema();
Dataset<Row> ds2 = ds.select("articles").toDF();
ds2.printSchema();
spark.sql("drop table if exists table1");
ds2.write().saveAsTable("table1");
I have this json format 我有这个json格式
root
|-- articles: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- author: string (nullable = true)
| | |-- content: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- publishedAt: string (nullable = true)
| | |-- source: struct (nullable = true)
| | | |-- id: string (nullable = true)
| | | |-- name: string (nullable = true)
| | |-- title: string (nullable = true)
| | |-- url: string (nullable = true)
| | |-- urlToImage: string (nullable = true)
|-- status: string (nullable = true)
|-- totalResults: long (nullable = true)
I want to save the array articles as a hive's table with the arrays format 我想使用数组格式将数组文章另存为配置单元表
example of hive table that i want: 我想要的蜂巢表的示例:
author (string)
content (string)
description (string)
publishedat (string)
source (struct<id:string,name:string>)
title (string)
url (string)
urltoimage (string)
The problem is that is saving the table just with one column named article and the contend is inside in this only column 问题是只用一个名为article的列保存了表,而竞争只在该列中
A bit convoluted, but I found this one to work: 有点令人费解,但是我发现这个可以工作:
import org.apache.spark.sql.functions._
ds.select(explode(col("articles")).as("exploded")).select("exploded.*").toDF()
I tested it on 我测试了
{
"articles": [
{
"author": "J.K. Rowling",
"title": "Harry Potter and the goblet of fire"
},
{
"author": "George Orwell",
"title": "1984"
}
]
}
and it returned (after collecting it into an array) 然后返回(将其收集到数组中之后)
result = {Arrays$ArrayList@13423} size = 2
0 = {GenericRowWithSchema@13425} "[J.K. Rowling,Harry Potter and the goblet of fire]"
1 = {GenericRowWithSchema@13426} "[George Orwell,1984]"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.