简体   繁体   English

pysparkintersection() 函数来比较数据帧

[英]pyspark intersection() function to compare data frames

Below is the code I have written to compare two dataframes and impose intersection function on them.下面是我编写的用于比较两个数据帧并对它们施加交集函数的代码。

import os
from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://xxx:xxx").option("databaseName","xxx").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","xxx").option("user","xxxx").option("password","xxxx").load()

df.registerTempTable("test")

df1= sqlContext.sql("select * from test where amitesh<= 300")
df2= sqlContext.sql("select * from test where amitesh <= 400")

df3= df1.intersection(df2)
df3.show()

I am getting below error:我收到以下错误:

AttributeError: 'DataFrame' object has no attribute 'intersection'

If my understanding is correct, intersection() is an inbuilt sub-function derived from python set function.如果我的理解是正确的,intersection() 是派生自 python set 函数的内置子函数。 So,所以,

1) if I am trying to use it inside pyspark, do I need to import any special module inside my code, or it should work as in-built for pyspark as well? 1) 如果我想在 pyspark 中使用它,我是否需要在我的代码中导入任何特殊模块,或者它也应该像 pyspark 内置的一样工作?

2) To use this intersection() function, do we first need to convert df to rdd? 2)要使用这个intersection()函数,我们是否首先需要将df转换为rdd?

Please correct me wherever I am wrong.请在我错的地方纠正我。 Can somebody give me a working example?有人可以给我一个有效的例子吗?

My motive is to get the common record from SQL server and move to HIVE.我的动机是从 SQL 服务器获取公共记录并迁移到 HIVE。 As of now, I am first trying to get my intersection function work and then start with the HIVE requirement that I can take care off if intersection() is working.到目前为止,我首先尝试让我的交集函数工作,然后从 HIVE 要求开始,如果intersection() 正在工作,我可以照顾它。

我让它为我工作,而不是intersect(),我使用了intersect(),它起作用了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM