简体   繁体   中英

How do I read avro file as a list of objects in Java Spark

I have an avro file which i want to read and operate on after converting it to its representative object

I've tried loading it using RDD and DataSet in Java Spark but in both cases i'm unable to convert to the required object

As DataSet

Dataset<MyClass> input = sparkSession.read().format("com.databricks.spark.avro").load(inputPath)
                .as(Encoders.bean(MyClass.class)); 

This fails with error "Cannot have circular references in bean class, but got the circular reference of class class org.apache.avro.Schema"

As RDD

JavaRDD<String> input = sparkContext.textFile(inputPath);

How can I convert this RDD object to RDD object or Dataset object?

I'm pretty new to this so pardon me if I'm missing something basic but unable to find a working solution.

This is solved by using SparkAvroLoader from https://github.com/CeON/spark-utils

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM