[英]How to compare two data frames in Pyspark
c = df[df['CUSTOMER_EMAIL_ID'].isin(d.CUSTOMER_EMAIL_ID)]
如何在PySpark中编写相同的表达式?
If you're asking "give me all the rows from df
where the CUSTOMER_EMAIL_ID
field has a matching value from the CUSTOMER_EMAIL_ID
field in d
", then I think your question can be answered using a semi join , specifically: 如果你问:“给我所有行
df
,其中CUSTOMER_EMAIL_ID
领域已经从一个匹配值CUSTOMER_EMAIL_ID
现场d
”,那么我认为你的问题可以用一个回答的半加盟 ,具体如下:
c = df.join(b, 'CUSTOMER_EMAIL_ID', 'leftsemi')
A left (right) semi join can be thought of conceptually as a inner join followed by dropping the right (left) columns. 从概念上讲,左(右)半联接可以视为内部联接,然后删除右(左)列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.