I am working on Spark SQL with Spark(2.0) and using Java API for reading CSV.
In CSV file there is a double quotes, comma separated Column. Ex: "Express Air,Delivery Truck"
Code for reading CSV and returning Dataset:
Dataset<Row> df = spark.read()
.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("header", "true")
.load(filename)
Result:
+-----+--------------+--------------------------+
|Year | State | Ship Mode |...
+-----+--------------+--------------------------+
|2012 |New York |Express Air,Delivery Truck|...
|2013 |Nevada |Delivery Truck |...
|2013 |North Carolina|Regular Air,Delivery Truck|...
+-----+--------------+--------------------------+
But, I want to split Shop Mode
to Mode1
and Mode2
Column and return as a Dataset.
+-----+--------------+--------------+---------------+
|Year | State | Mode1 | Mode2 |...
+-----+--------------+--------------+---------------+
|2012 |New York |Express Air |Delivery Truck |...
|2013 |Nevada |Delivery Truck|null |...
|2013 |North Carolina|Regular Air |Delivery Truck |...
+-----+--------------+--------------+---------------+
Is there any way I can do this using Java Spark?
I tried with MapFunction, but call() method not returning Row. Ship Mode
will be Dynamic ie, CSV may contain one Ship Mode or two.
Thanks.
You can use selectExpr , a variant of select that accepts SQL expressions , like this:
df.selectExpr("Year","State","split(Ship Mode, ',')[0] as Mode1","split(Ship Mode, ',')[1] as Mode2");
The result is a Dataset of Row.
We could:
eg.:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Column, Row}
val splitter = udf((str: String) => {
val splitted = str.split(",").lift
Array(splitted(0), splitted(1))
})
val dfShipMode = df.select($"year",$"state", splitter($"shipMode") as "modes")
.select($"year", $"state", $"modes"(0) as "mode1", $"modes"(1) as "mode2")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.