I have two data sets :
itemname itemId coupons
A 1 true
A 2 false
itemname purchases
B 10
A 10
C 10
I need to get
itemname itemId coupons purchases
A 1 true 10
A 2 false 10
Im doing -
val mm = items.join(purchases, items("itemname") === purchases("itemname")).drop(items("itemname"))
Is this the correct way of doing this in spark scala ?
This code:
val itemsSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("itemid", IntegerType, nullable = false),
StructField("coupons", BooleanType, nullable = false))
val purchasesSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("purchases", IntegerType, nullable = false))
val items = Seq(Row("A", 1, true), Row("A", 2, false))
val purchases = Seq(Row("A", 10), Row("B", 10), Row("C", 10))
val itemsDF = spark.createDataFrame(
spark.sparkContext.parallelize(items),
StructType(itemsSchema)
)
val purchasesDF = spark.createDataFrame(
spark.sparkContext.parallelize(purchases),
StructType(purchasesSchema)
)
purchasesDF.join(itemsDF, Seq("itemname")).show(false)
gives:
+--------+---------+------+-------+
|itemname|purchases|itemid|coupons|
+--------+---------+------+-------+
|A |10 |1 |true |
|A |10 |2 |false |
+--------+---------+------+-------+
hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.