简体   繁体   English

如何获取apache spark中数据框B列以外的A列元素的列表?

[英]How to get a list of the elements in column A that are not in column B of dataframe in apache spark?

I have 2 datafame X,Y. 我有2个数据名人,X,Y。 X has column A, Y have column B. A,B has type string. X具有列A,Y具有列B。A,B具有字符串类型。 How to get a list of the elements in column A that are not in column B?? 如何获取列A中不在列B中的元素的列表?

Or I have a string S and I want to check if S is an element in column A. How to check?? 或者我有一个字符串S,我想检查S是否是A列中的元素。如何检查?

please help me!! 请帮我!! :( I code in scala! :(我在scala中编写代码!

Regarding your first question (filter all elements within DataFrame X that are not in DataFrame Y): 关于第一个问题(过滤DataFrame X中不在DataFrame Y中的所有元素):

val X = Seq("1", "2", "3", "4", "5").toDF("A")
val Y = Seq("4", "5", "6", "7", "8").toDF("B")

X.except(Y).show()

Output: 输出:

+---+
|  A|
+---+
|  3|
|  1|
|  2|
+---+

Your second question (checking if string S exists in column A in DataFrame X): 您的第二个问题(检查字符串S是否存在于DataFrame X的A列中):

val lookFor = "3"
assert(X.where(s"A == '$lookFor'").count() > 0)

Hope it helps :-) 希望能帮助到你 :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM