简体   繁体   English

Apache Spark - 'LeftAnti'加入Ambiguous列错误

[英]Apache Spark - 'LeftAnti' join Ambiguous column error

I am trying to perform a leftanti join in spark 2.4 but I am running into errors. 我试图在spark 2.4中执行leftanti连接但是我遇到了错误。

Dataset<Row> df = spark.read()
            .option("mode", "DROPMALFORMED")
            .schema(schema)
            .csv("/some/path/cars.csv");

Dataset<Row> Audis = df.filter(col("Make").equalTo("Audi"));
Dataset<Row> BMWs = df.filter(col("Make").equalTo("BMW"));
Audis.join(BMWs, "Make").show();

df.as("df").join(Audis.as("audi"), col("Make"), "leftanti")
           .show();

The first join works fine, but for the leftanti I get the following error: 第一个连接工作正常,但对于leftanti我得到以下错误:

org.apache.spark.sql.AnalysisException: Reference 'Make' is ambiguous, could be: df.Make, audi.Make.;

Why would this be ambiguous? 为什么这会模糊不清? It should know which column should be checking for 'IS NOT NULL' in this kind of join. 它应该知道哪种列应该在这种连接中检查'IS NOT NULL'。

Other examples show this in Scala by providing a column expression, but this seems like it's not possible in Java as there is no method signature that supports an expression string like 'df.Make == Audi.Make' 其他示例通过提供列表达式在Scala中显示了这一点,但这似乎在Java中是不可能的,因为没有方法签名支持像'df.Make == Audi.Make'这样的表达式字符串

// No method exists for such a signature
df.as("df").join(Audis.as("audi"), "df.Make == audi.Make", "leftanti")

So far all examples I have seen of this type of join are not in Java, can someone explain why this error is occurring and what is a working example? 到目前为止,我所看到的这种类型的连接的所有示例都不是Java,有人可以解释为什么会出现此错误以及什么是工作示例?

Consulting with some colleagues and spending a few hours together. 与一些同事协商,共度几个小时。 You need to use col("MyColumn").equalTo(col("OtherColumn")). 你需要使用col(“MyColumn”)。equalTo(col(“OtherColumn”))。

This example works: 这个例子有效:

df.as("df").join(Audis.as("audi"), col("df.Make").equalTo(col("audi.Make")), "leftanti")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM