[英]How to Filter a List in spark with another column of same dataframe(Version 2.2)
[英]How to filter Spark dataframe if one column is a member of another column
我有一個具有兩列的數據框(一個字符串和一個字符串數組):
root
|-- user: string (nullable = true)
|-- users: array (nullable = true)
| |-- element: string (containsNull = true)
如何才能篩選數據框,這樣的結果數據框只包含行時, user
是users
?
快速簡單:
import org.apache.spark.sql.functions.expr
df.where(expr("array_contains(users, user)")
當然,這是可能的,而不是那么難。 為此,您可以使用UDF
。
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val df = sc.parallelize(Array(
("1", Array("1", "2", "3")),
("2", Array("1", "2", "2", "3")),
("3", Array("1", "2"))
)).toDF("user", "users")
val inArray = udf((id: String, array: scala.collection.mutable.WrappedArray[String]) => array.contains(id), BooleanType)
df.where(inArray($"user", $"users")).show()
輸出為:
+----+------------+
|user| users|
+----+------------+
| 1| [1, 2, 3]|
| 2|[1, 2, 2, 3]|
+----+------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.