無法將兩個 DataFrame 上的左連接錯誤應用到（org.apache.spark.sql.Dataset、org.apache.spark.sql.Column、String）

Question

我能夠讀取它們的兩個數據框，但加入它們會給我一個錯誤，我可以加入一個筆記本

val s3Reader = new S3Reader(new S3Configuration, sparkSession, "mece_gaia_gaia_property_mapping")

val geoFeaturesPropertyDF = s3Reader.get(StorageFormat.PARQUET, "s3n:" + giNewBucket + geoInsightsPath + "/properties.parquet")

val meceGaiaGaia = s3Reader.get(StorageFormat.PARQUET, "s3:" + outputBucket + gaiaMeceGaiaPropertiesMappingPath)

val meceGaiaGaiaProperties = geoFeaturesPropertyDF.join(meceGaiaGaia, meceGaiaGaia("gaia_id") === geoFeaturesPropertyDF("gaia_id"), "left")

但是在加入他們時，我遇到了一個錯誤

error: overloaded method value join with alternatives:
[ERROR]   (right: org.apache.spark.sql.Dataset[_],joinExprs: org.apache.spark.sql.Column,joinType: String)org.apache.spark.sql.DataFrame <and>
[ERROR]   (right: org.apache.spark.sql.Dataset[_],usingColumns: Seq[String],joinType: String)org.apache.spark.sql.DataFrame
[ERROR]  cannot be applied to (org.apache.spark.sql.Dataset, org.apache.spark.sql.Column, String)
[ERROR]             .join(meceGaiaGaia, meceGaiaGaia("gaia_id") === geoFeaturesPropertyDF("gaia_id"), "left")

他們的架構

meceGaiaGaia 架構 -

org.apache.spark.sql.types.StructType = StructType(StructField(gaia_id,StringType,true), StructField(short_name,StringType,true), StructField(long_name,StringType,true), StructField(category,StringType,true), StructField(expe_property_id,IntegerType,true), StructField(airport_code,StringType,true), StructField(mece_gaia_id,StringType,true), StructField(mece_short_name,StringType,true), StructField(mece_long_name,StringType,true), StructField(mece_category,StringType,true), StructField(province_id,StringType,true), StructField(province,StringType,true), StructField(country_id,StringType,true), StructField(country,StringType,true), StructField(continent,StringType,true), StructField(super_region,StringType,true))

geoFeaturesPropertyDF 模式

org.apache.spark.sql.types.StructType = StructType(StructField(gaia_id,StringType,true), StructField(source_id,StringType,true), StructField(type,StringType,true), StructField(status,StringType,true), StructField(creation_time,StringType,true), StructField(update_time,StringType,true), StructField(attributes,MapType(StringType,StringType,true),true), StructField(ancestors_id,StringType,true), StructField(hierarchy,ArrayType(MapType(StringType,StringType,true),true),true), StructField(categories,ArrayType(StringType,true),true), StructField(classifiers_set,MapType(StringType,ArrayType(MapType(StringType,StringType,true),true),true),true), StructField(short_name,StringType,true), StructField(long_name,StringType,true), StructField(ancestors,ArrayType(StringType,true),true), StructFi

任何幫助表示贊賞

Answer 1

  val meceGaiaGaiaProperties =
  geoFeaturesPropertyDF.join(meceGaiaGaia,
  geoFeaturesPropertyDF("gaia_id" === meceGaiaGaia("gaia_id")),
  "left")

Answer 2

更新了要使用的代碼

sparkSession.read.parquet而不是 S3Reader 並且有效

val geoFeaturesPropertyDF = sparkSession.read.parquet("s3n:" + giNewBucket + geoInsightsPath + "/properties.parquet")

val meceGaiaGaia = sparkSession.read.parquet("s3:" + outputBucket + gaiaMeceGaiaPropertiesMappingPath)

val meceGaiaGaiaProperties = geoFeaturesPropertyDF.join(meceGaiaGaia, meceGaiaGaia("gaia_id") === geoFeaturesPropertyDF("gaia_id"), "left")

無法將兩個 DataFrame 上的左連接錯誤應用到（org.apache.spark.sql.Dataset、org.apache.spark.sql.Column、String）

問題描述

2 個解決方案

解決方案1
0 2022-07-14 08:58:04

解決方案2
0 2022-07-14 10:15:43

無法將兩個 DataFrame 上的左連接錯誤應用到（org.apache.spark.sql.Dataset、org.apache.spark.sql.Column、String）

問題描述

2 個解決方案

解決方案1 0 2022-07-14 08:58:04

解決方案2 0 2022-07-14 10:15:43

解決方案1
0 2022-07-14 08:58:04

解決方案2
0 2022-07-14 10:15:43