简体   繁体   中英

Apache Avro Parquet java.lang.NoSuchFieldError: NULL_VALUE

I have been stuck for 3 days. i am trying to read parquet files with Apache Avro. I am simply reading a file, from a list of files, and then iterating until all files are complete.

the code works fine within its own scala file, however, I suspect it could be something to do with dependencies and the external lib that I am including.

Has anyone else had a similar error and been able to solve this?

Code

  override def generateData(): Option[GenericRecord] = {
    val conf: Configuration = new Configuration()
    conf.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, true)
    if (filePaths.size == 0){
      dataSourceComplete()
      None
    } else {
      x += 1
      var line = parquetReader.read()
      if (line == null){
        println(x)
        val nextFile = filePaths.last
        filePaths = filePaths.init
        println(nextFile)
        parquetReader = AvroParquetReader.builder[GenericRecord](HadoopInputFile.fromPath(new Path(nextFile), conf)).withConf(conf).build()
        line = parquetReader.read()
      }
      Some(line)
    }
  }

Error

Uncaught error from thread [Raphtory-akka.actor.default-dispatcher-17]: NULL_VALUE, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Raphtory]
java.lang.NoSuchFieldError: NULL_VALUE
        at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:246)
        at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
        at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
        at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at com.raphtory.ethereum.spout.EthereumTransactionSpout.generateData(EthereumTransactionSpout.scala:59)

This is my build.sbt

scalaVersion := "2.12.11"
Compile / unmanagedJars += baseDirectory.value / "lib/raphtory.jar"
val AkkaVersion = "2.6.14"
libraryDependencies ++= Seq(
  "com.lightbend.akka" %% "akka-stream-alpakka-avroparquet" % "3.0.3",
  "com.typesafe.akka" %% "akka-stream" % AkkaVersion
)

0

I came across the same issue so thought I'd share. I came across shading jars and for my application as I discovered I am using some dependency libraries that introduce some conflicts with the avro version. So I shade the avro library in my pom.xml - ie rename the package so it doesn't conflict with anything else.

First thing is to add the maven-shade-plugin which allows you to (a) create an uber JAR and (b) to shade its contents, see here for more info. Here is a snippet:

 <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.6.2</version>

... Then I shade the library:

<relocation>
              <pattern>org.apache.avro</pattern>
              <shadedPattern>[RENAME-HERE].shaded.org.apache.avro</shadedPattern>
</relocation>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM