简体   繁体   中英

DataFrame to Json Array in Spark

I am writing Spark Application in Java which reads the HiveTable and store the output in HDFS as Json Format.

I read the hive table using HiveContext and it returns the DataFrame. Below is the code snippet.

 SparkConf conf = new SparkConf().setAppName("App");
 JavaSparkContext sc = new JavaSparkContext(conf);
 HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);

DataFrame data1= hiveContext.sql("select * from tableName")

Now I want to convert DataFrame to JsonArray . For Example, data1 data looks like below

|  A  |     B     |
-------------------
|  1  | test      |
|  2  | mytest    |

I need an output like below

[{1:"test"},{2:"mytest"}]

I tried using data1.schema.json() and it gives me the output like below, not an Array.

{1:"test"}
{2:"mytest"}

What is the right approach or function to convert the DataFrame to jsonArray without using any third Party libraries.

data1.schema.json will give you a JSON string containing the schema of the dataframe and not the actual data itself. You will get :

String = {"type":"struct",
          "fields":
                  [{"name":"A","type":"integer","nullable":false,"metadata":{}},
                  {"name":"B","type":"string","nullable":true,"metadata":{}}]}

To convert your dataframe to array of JSON, you need to use toJSON method of DataFrame:

val df = sc.parallelize(Array( (1, "test"), (2, "mytest") )).toDF("A", "B")
df.show()

+---+------+
|  A|     B|
+---+------+
|  1|  test|
|  2|mytest|
+---+------+

df.toJSON.collect.mkString("[", "," , "]" )
String = [{"A":1,"B":"test"},{"A":2,"B":"mytest"}]

in JAVA you can do it with the following way:

String jsonToReturn = df.toJSON().collectAsList().toString();
reutrn jsonToReturn;

and return it as a response if it's a server-side .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM