I have been stuck for 3 days. i am trying to read parquet files with Apache Avro. I am simply reading a file, from a list of files, and then iterating until all files are complete.
the code works fine within its own scala file, however, I suspect it could be something to do with dependencies and the external lib that I am including.
Has anyone else had a similar error and been able to solve this?
Code
override def generateData(): Option[GenericRecord] = {
val conf: Configuration = new Configuration()
conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, true)
if (filePaths.size == 0){
dataSourceComplete()
None
} else {
x += 1
var line = parquetReader.read()
if (line == null){
println(x)
val nextFile = filePaths.last
filePaths = filePaths.init
println(nextFile)
parquetReader = AvroParquetReader.builder[GenericRecord](HadoopInputFile.fromPath(new Path(nextFile), conf)).withConf(conf).build()
line = parquetReader.read()
}
Some(line)
}
}
Error
Uncaught error from thread [Raphtory-akka.actor.default-dispatcher-17]: NULL_VALUE, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Raphtory]
java.lang.NoSuchFieldError: NULL_VALUE
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:246)
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at com.raphtory.ethereum.spout.EthereumTransactionSpout.generateData(EthereumTransactionSpout.scala:59)
This is my build.sbt
scalaVersion := "2.12.11"
Compile / unmanagedJars += baseDirectory.value / "lib/raphtory.jar"
val AkkaVersion = "2.6.14"
libraryDependencies ++= Seq(
"com.lightbend.akka" %% "akka-stream-alpakka-avroparquet" % "3.0.3",
"com.typesafe.akka" %% "akka-stream" % AkkaVersion
)
0
I came across the same issue so thought I'd share. I came across shading jars and for my application as I discovered I am using some dependency libraries that introduce some conflicts with the avro version. So I shade the avro library in my pom.xml - ie rename the package so it doesn't conflict with anything else.
First thing is to add the maven-shade-plugin which allows you to (a) create an uber JAR and (b) to shade its contents, see here for more info. Here is a snippet:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.6.2</version>
... Then I shade the library:
<relocation>
<pattern>org.apache.avro</pattern>
<shadedPattern>[RENAME-HERE].shaded.org.apache.avro</shadedPattern>
</relocation>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.