简体   繁体   中英

How to get a list of the elements in column A that are not in column B of dataframe in apache spark?

I have 2 datafame X,Y. X has column A, Y have column B. A,B has type string. How to get a list of the elements in column A that are not in column B??

Or I have a string S and I want to check if S is an element in column A. How to check??

please help me!! :( I code in scala!

Regarding your first question (filter all elements within DataFrame X that are not in DataFrame Y):

val X = Seq("1", "2", "3", "4", "5").toDF("A")
val Y = Seq("4", "5", "6", "7", "8").toDF("B")

X.except(Y).show()

Output:

+---+
|  A|
+---+
|  3|
|  1|
|  2|
+---+

Your second question (checking if string S exists in column A in DataFrame X):

val lookFor = "3"
assert(X.where(s"A == '$lookFor'").count() > 0)

Hope it helps :-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM