简体   繁体   English

如何将 avro 文件读取为 Java Spark 中的对象列表

[英]How do I read avro file as a list of objects in Java Spark

I have an avro file which i want to read and operate on after converting it to its representative object我有一个 avro 文件,我想在将其转换为代表对象后对其进行读取和操作

I've tried loading it using RDD and DataSet in Java Spark but in both cases i'm unable to convert to the required object我尝试在 Java Spark 中使用 RDD 和 DataSet 加载它,但在这两种情况下我都无法转换为所需的对象

As DataSet作为数据集

Dataset<MyClass> input = sparkSession.read().format("com.databricks.spark.avro").load(inputPath)
                .as(Encoders.bean(MyClass.class)); 

This fails with error "Cannot have circular references in bean class, but got the circular reference of class class org.apache.avro.Schema"这失败并出现错误“不能在 bean 类中有循环引用,但得到了类 org.apache.avro.Schema 的循环引用”

As RDD作为RDD

JavaRDD<String> input = sparkContext.textFile(inputPath);

How can I convert this RDD object to RDD object or Dataset object?如何将此 RDD 对象转换为 RDD 对象或 Dataset 对象?

I'm pretty new to this so pardon me if I'm missing something basic but unable to find a working solution.我对此很陌生,所以如果我遗漏了一些基本的东西但无法找到有效的解决方案,请原谅我。

This is solved by using SparkAvroLoader from https://github.com/CeON/spark-utils这是通过使用来自https://github.com/CeON/spark-utils的 SparkAvroLoader 解决的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM