[英]Compare two different columns from two different pyspark dataframe
我正在嘗試比較兩個不同數據框中的兩個不同列,如果找到匹配項,我將返回值 1 else None -
df1 =
df2 =
df1(預期輸出)=
我試過下面的代碼 -
def getImpact(row):
match = df2.filter(df2.second_key == row)
if match.count() > 0:
return 1
return None
udf_sol = udf(lambda x: getImpact(x), IntegerType())
df1 = df1.withcolumn('impact',udf_sol(df1.first_key))
但出現以下錯誤 - TypeError: cannot pickle '_thread.RLock' object
任何人都可以幫助我實現如上所示的預期輸出嗎?
謝謝
將 numpy 導入為 np
df1['final']= np.where(df1['first_key']==df2['second_key'],'1','None')
假設first_key
和second_key
是唯一的,您可以選擇跨數據幀連接 -
可以在此處找到更多示例和說明
from pyspark import SparkContext
from pyspark.sql import SQLContext
from functools import reduce
import pyspark.sql.functions as F
from pyspark.sql import Window
data_list1 = [
("abcd","Key1")
,("jkasd","Key2")
,("oigoa","Key3")
,("ad","Key4")
,("bas","Key5")
,("lkalsjf","Key6")
,("bsawva","Key7")
]
data_list2 = [
("cashj","Key1",10)
,("ax","Key11",12)
,("safa","Key5",21)
,("safasf","Key6",78)
,("vasv","Key3",4)
,("wgaga","Key8",0)
,("saasfas","Key7",10)
]
sparkDF1 = sql.createDataFrame(data_list1,['data','first_key'])
sparkDF2 = sql.createDataFrame(data_list2,['temp_data','second_key','frinks'])
>>> sparkDF1
+-------+---------+
| data|first_key|
+-------+---------+
| abcd| Key1|
| jkasd| Key2|
| oigoa| Key3|
| ad| Key4|
| bas| Key5|
|lkalsjf| Key6|
| bsawva| Key7|
+-------+---------+
>>> sparkDF2
+---------+----------+------+
|temp_data|second_key|frinks|
+---------+----------+------+
| cashj| Key1| 10|
| ax| Key11| 12|
| safa| Key5| 21|
| safasf| Key6| 78|
| vasv| Key3| 4|
| wgaga| Key8| 0|
| saasfas| Key7| 10|
+---------+----------+------+
#### Joining the dataframes on common columns
finalDF = sparkDF1.join(
sparkDF2
,(sparkDF1['first_key'] == sparkDF2['second_key'])
,'left'
).select(sparkDF1['*'],sparkDF2['frinks']).orderBy('frinks')
### Identifying impact if the frinks value is Null or Not
finalDF = finalDF.withColumn('impact',F.when(F.col('frinks').isNull(),0).otherwise(1))
>>> finalDF.show()
+-------+---------+------+------+
| data|first_key|frinks|impact|
+-------+---------+------+------+
| jkasd| Key2| null| 0|
| ad| Key4| null| 0|
| oigoa| Key3| 4| 1|
| abcd| Key1| 10| 1|
| bsawva| Key7| 10| 1|
| bas| Key5| 21| 1|
|lkalsjf| Key6| 78| 1|
+-------+---------+------+------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.