[英]how to skip empty rdd when join in spark
I want to get 2 rdd from Cassandra,then join them.And I want to skip the empty value.我想从 Cassandra 获得 2 rdd,然后加入他们。我想跳过空值。
def extractPair(rdd: RDD[CassandraRow]) = {
rdd.map((row: CassandraRow) => {
val name = row.getName("name")
if (name == "")
None //join wrong
else
(name, row.getUUID("object"))
})
}
val rdd1 = extractPair(cassRdd1)
val rdd2 = extractPair(cassRdd2)
val joinRdd = rdd1.join(rdd2) //"None" join wrong
use flatMap can fix this,but i want to know how to use map fix this使用 flatMap 可以解决这个问题,但我想知道如何使用 map 解决这个问题
def extractPair(rdd: RDD[CassandraRow]) = {
rdd.flatMap((row: CassandraRow) => {
val name = row.getName("name")
if (name == "")
seq()
else
Seq((name, row.getUUID("object")))
})
}
This isn't possible with just a map
.仅使用map
是不可能的。 You would need to follow it up with a filter
.您需要使用filter
进行跟进。 But you would still be best to wrap the valid result in a Some
.但是您仍然最好将有效结果包装在Some
。 But, then you would still have it wrapped in a Some as a result...requiring a second map
to unwrap it.但是,那么您仍然会将它包裹在 Some 中,结果……需要第二张map
来解开它。 So, realistically, your best option is something like this:所以,实际上,你最好的选择是这样的:
def extractPair(rdd: RDD[CassandraRow]) = {
rdd.flatMap((row: CassandraRow) => {
val name = row.getName("name")
if (name == "") None
else Some((name, row.getUUID("object")))
})
}
Option
is implicitly convertable to a flattenable type and conveys your methods message better. Option
可隐式转换为可展平的类型,并更好地传达您的方法消息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.