[英]Spark 2.4.0 Avro Java - cannot resolve method from_avro
I'm trying to run a spark stream from a kafka queue containing Avro messages.我正在尝试从包含 Avro 消息的 kafka 队列运行火花流。
As perhttps://spark.apache.org/docs/latest/sql-data-sources-avro.html I should be able to use from_avro
to convert column value to Dataset<Row>
.根据https://spark.apache.org/docs/latest/sql-data-sources-avro.html我应该能够使用from_avro
将列值转换为Dataset<Row>
。
However, I'm unable to compile the project as it complains from_avro
cannot be found.但是,我无法编译该项目,因为它抱怨from_avro
。 I can see the method declared in package.class of the dependency.我可以看到在依赖的 package.class 中声明的方法。
How can I use the from_avro
method from org.apache.spark.sql.avro
in my Java code locally?如何在本地 Java 代码中使用org.apache.spark.sql.avro
中的from_avro
方法?
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.avro.*;
public class AvroStreamTest {
public static void main(String[] args) throws IOException, InterruptedException {
// Creating local sparkSession here...
Dataset<Row> df = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host:port")
.option("subscribe", "avro_queue")
.load();
// Cannot resolve method 'from_avro'...
df.select(from_avro(col("value"), jsonFormatSchema)).writeStream().format("console")
.outputMode("update")
.start();
}
}
pom.xml: pom.xml:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<!-- more dependencies below -->
</dependencies>
It seems like Java is unable to import names from sql.avro.package.class
似乎 Java 无法从sql.avro.package.class
导入名称
It's because of the generated class names, importing it as import org.apache.spark.sql.avro.package$;
这是因为生成的类名,将其import org.apache.spark.sql.avro.package$;
为import org.apache.spark.sql.avro.package$;
and then using package$.MODULE$.from_avro(...)
should work然后使用package$.MODULE$.from_avro(...)
应该可以工作
You need to include spark-sql-avro in your pom.xml which is available at您需要在 pom.xml 中包含spark-sql-avro ,该文件位于
https://mvnrepository.com/artifact/org.apache.spark/spark-sql-avro_2.11/2.4.0-palantir.28-1-gdf34e2d https://mvnrepository.com/artifact/org.apache.spark/spark-sql-avro_2.11/2.4.0-palantir.28-1-gdf34e2d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.