简体   繁体   English

Spark 2.4.0 Avro Java - 无法解析 from_avro 方法

[英]Spark 2.4.0 Avro Java - cannot resolve method from_avro

I'm trying to run a spark stream from a kafka queue containing Avro messages.我正在尝试从包含 Avro 消息的 kafka 队列运行火花流。

As perhttps://spark.apache.org/docs/latest/sql-data-sources-avro.html I should be able to use from_avro to convert column value to Dataset<Row> .根据https://spark.apache.org/docs/latest/sql-data-sources-avro.html我应该能够使用from_avro将列值转换为Dataset<Row>

However, I'm unable to compile the project as it complains from_avro cannot be found.但是,我无法编译该项目,因为它抱怨from_avro I can see the method declared in package.class of the dependency.我可以看到在依赖的 package.class 中声明的方法。

How can I use the from_avro method from org.apache.spark.sql.avro in my Java code locally?如何在本地 Java 代码中使用org.apache.spark.sql.avro中的from_avro方法?

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.avro.*;


public class AvroStreamTest {
    public static void main(String[] args) throws IOException, InterruptedException {

     // Creating local sparkSession here...

        Dataset<Row> df = sparkSession
                .readStream()
                .format("kafka")
                .option("kafka.bootstrap.servers", "host:port")
                .option("subscribe", "avro_queue")
                .load();

        // Cannot resolve method 'from_avro'...
        df.select(from_avro(col("value"), jsonFormatSchema)).writeStream().format("console")
                .outputMode("update")
                .start();


    }
}

pom.xml: pom.xml:

<dependencies>
    <dependency> 
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-avro_2.11</artifactId>
        <version>2.4.0</version>
    </dependency>
  <!-- more dependencies below -->

</dependencies>

It seems like Java is unable to import names from sql.avro.package.class似乎 Java 无法从sql.avro.package.class导入名称

It's because of the generated class names, importing it as import org.apache.spark.sql.avro.package$;这是因为生成的类名,将其import org.apache.spark.sql.avro.package$;import org.apache.spark.sql.avro.package$; and then using package$.MODULE$.from_avro(...) should work然后使用package$.MODULE$.from_avro(...)应该可以工作

You need to include spark-sql-avro in your pom.xml which is available at您需要在 pom.xml 中包含spark-sql-avro ,该文件位于

https://mvnrepository.com/artifact/org.apache.spark/spark-sql-avro_2.11/2.4.0-palantir.28-1-gdf34e2d https://mvnrepository.com/artifact/org.apache.spark/spark-sql-avro_2.11/2.4.0-palantir.28-1-gdf34e2d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM