[英]Spark's Column.isin function does not take List
I am trying to filter out rows from my Spark Dataframe. 我正在尝试从Spark Dataframe中过滤出行。
val sequence = Seq(1,2,3,4,5)
df.filter(df("column").isin(sequence))
Unfortunately, I get an unsupported literal type error 不幸的是,我得到了一个不受支持的文字类型错误
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(1,2,3,4,5)
according to the documentation it takes a scala.collection.Seq list 根据文档,它采用scala.collection.Seq列表
I guess I don't want a literal? 我想我不想要文字? Then what can I take in, some sort of wrapper class? 然后我可以接受什么样的包装类呢?
@JustinPihony's answer is correct but it's incomplete. @ JustinPihony的答案是正确的,但它不完整。 The isin
function takes a repeated parameter for argument, so you'll need to pass it as so : isin
函数为参数采用重复参数 ,因此您需要将其传递给:
scala> val df = sc.parallelize(Seq(1,2,3,4,5,6,7,8,9)).toDF("column")
// df: org.apache.spark.sql.DataFrame = [column: int]
scala> val sequence = Seq(1,2,3,4,5)
// sequence: Seq[Int] = List(1, 2, 3, 4, 5)
scala> val result = df.filter(df("column").isin(sequence : _*))
// result: org.apache.spark.sql.DataFrame = [column: int]
scala> result.show
// +------+
// |column|
// +------+
// | 1|
// | 2|
// | 3|
// | 4|
// | 5|
// +------+
This is happening because the underlying Scala implementation uses varargs
, so the documentation in Java is not quite correct. 发生这种情况是因为底层的Scala实现使用了varargs
,因此Java中的文档并不完全正确。 It is using the @varargs
annotation, so you can just pass in an array. 它使用@varargs
注释,因此您只需传入一个数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.