简体   繁体   English

Spark 2.1.0,执行第二次连接时无法解析列名

[英]Spark 2.1.0, cannot resolve column name when doing second join

I have three tables, and both have keys in one table, so I did a join on A & B = D 我有三个表,并且两个表都在一个表中,因此我在A&B = D上进行了联接

Now I want to finish the join with a join with D & C 现在,我要完成与D&C的加入

The problem is I get this error: 问题是我收到此错误:

org.apache.spark.sql.AnalysisException: Cannot resolve column name "ClaimKey" among (_1, _2);
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:219)

This is the actual code, from Zeppelin: 这是来自齐柏林飞艇的实际代码:

joinedperson.printSchema
filteredtable.printSchema
val joined = joinedperson.joinWith(filteredtable, 
    filteredtable.col("ClaimKey") === joinedperson.col("ClaimKey"))

These are the schemas of the two tables I am trying to join, and the problem is with ClaimKey in the first schema. 这些是我尝试连接的两个表的架构,问题出在第一个架构中的ClaimKey。

root
 |-- _1: struct (nullable = false)
 |    |-- clientID: string (nullable = true)
 |    |-- PersonKey: string (nullable = true)
 |    |-- ClaimKey: string (nullable = true)
 |-- _2: struct (nullable = false)
 |    |-- ClientID: string (nullable = true)
 |    |-- MyPersonKey: string (nullable = true)
root
 |-- clientID: string (nullable = true)
 |-- ClaimType: string (nullable = true)
 |-- ClaimKey: string (nullable = true)

I had read the original data in from parquet files, then I used case classes to map the rows into classes, and have DataSets. 我已经从实木复合地板文件中读取了原始数据,然后使用案例类将行映射到类中,并具有数据集。

I expect it is due to the tuples, so how can I do this join? 我希望这是由于元组的缘故,那么我该如何加入呢?

The structure of your first DataFrame is nested - ClaimKey is a field within another field ( _1 ); 您的第一个DataFrame的结构是嵌套的 ClaimKey是另一个字段( _1 )内的一个字段; To access such a field, you can simply give the "route" to that field with parent fields separated by dots: 要访问这样的字段,您可以简单地为该字段提供“路由”,并以点分隔父字段:

val joined = joinedperson.joinWith(filteredtable, 
  filteredtable.col("ClaimKey") === joinedperson.col("_1.ClaimKey"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala Spark - 无法解析列名 - Scala Spark - Cannot resolve a column name 无法解析Spark Dataframe中的列(数字列名称) - Cannot resolve column (numeric column name) in Spark Dataframe 无法解析作为字符串传递的列的列名错误 - Spark Scala - Cannot resolve column name error for column passed as strings - Spark Scala Spark SQL QUERY join on Same column name - Spark SQL QUERY join on Same column name AnalysisException:无法解析(_1,_2,_3)中的列名“XYZ” - AnalysisException: Cannot resolve column name “XYZ” among (_1,_2,_3) 通过Spark SQL读取嵌套JSON - [AnalysisException]无法解析Column - Reading Nested JSON via Spark SQL - [AnalysisException] cannot resolve Column 尝试使用 Spark 聚合函数时无法解析符号 - Cannot Resolve Symbol When Trying to Using Spark Aggregation Function 在 Scala Spark 中,如何拆分一列,使前半部分成为列名,第二部分成为列值? - How do you split a column such that first half becomes the column name and the second the column value in Scala Spark? 在 Spark 2.1.0 中读取大文件时出现内存不足错误 - Out of Memory Error when Reading large file in Spark 2.1.0 在 DataFrame 中展平嵌套架构,获取 AnalysisException:无法解析列名 - Flatten Nested schema in DataFrame, getting AnalysisException: cannot resolve column name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM