简体   繁体   中英

Loading json data into Pair RDD in Spark using java

I am very new to Spark.

I have a very basic question. I read a file in Spark RDD in which each line is a JSON. I want to make apply groupBy like transformations. So I want to transform each JSON line into a PairRDD . Is there a straight forward way to do it in Java ?

My json is like this:

{
        "tmpl": "p",
        "bw": "874",
        "aver": {"cnac": "US","t1": "2"},
}

Currently, the way I am trying is the to split by , first and then by : . Is there any straight forward way to do this?

My current code:

val pairs = setECrecords.flatMap(x => (x.split(",")))
pairs.foreach(println)

val pairsastuple = pairs.map(x => if(x.split("=").length>1) (x.split("=")(0), x.split("=")(1)) else (x.split("=")(0), x))

You can try mapToPair() , but using the Spark SQL & DataFrames API will enable you to group things much more easily. The data frames API allows you to load JSON data directly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM