[英]How to join two datasets in scala?
I have two data sets : 我有两个数据集:
itemname itemId coupons
A 1 true
A 2 false
itemname purchases
B 10
A 10
C 10
I need to get 我需要得到
itemname itemId coupons purchases
A 1 true 10
A 2 false 10
Im doing - 我正在做 -
val mm = items.join(purchases, items("itemname") === purchases("itemname")).drop(items("itemname"))
Is this the correct way of doing this in spark scala ? 这是在Spark Scala中执行此操作的正确方法吗?
This code: 这段代码:
val itemsSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("itemid", IntegerType, nullable = false),
StructField("coupons", BooleanType, nullable = false))
val purchasesSchema = List(
StructField("itemname", StringType, nullable = false),
StructField("purchases", IntegerType, nullable = false))
val items = Seq(Row("A", 1, true), Row("A", 2, false))
val purchases = Seq(Row("A", 10), Row("B", 10), Row("C", 10))
val itemsDF = spark.createDataFrame(
spark.sparkContext.parallelize(items),
StructType(itemsSchema)
)
val purchasesDF = spark.createDataFrame(
spark.sparkContext.parallelize(purchases),
StructType(purchasesSchema)
)
purchasesDF.join(itemsDF, Seq("itemname")).show(false)
gives: 得到:
+--------+---------+------+-------+
|itemname|purchases|itemid|coupons|
+--------+---------+------+-------+
|A |10 |1 |true |
|A |10 |2 |false |
+--------+---------+------+-------+
hope this helps 希望这可以帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.