简体   繁体   中英

Spark and Influx: OKIO conflict

I'm running a job on Spark Yarn and trying to emit messages to Influx DB but I'm crashing on an okio conflict:

22:17:54 ERROR ApplicationMaster - User class threw exception: java.lang.NoSuchMethodError: okio.BufferedSource.readUtf8LineStrict(J)Ljava/lang/String;
java.lang.NoSuchMethodError: okio.BufferedSource.readUtf8LineStrict(J)Ljava/lang/String;
    at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:212)
    at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)

Here's my dependencies:

val cdhVersion = "cdh5.12.2"
val sparkVersion = "2.2.0.cloudera2"
val parquetVersion = s"1.5.0-$cdhVersion"
val hadoopVersion = s"2.6.0-$cdhVersion"
val awsVersion = "1.11.295"
val log4jVersion = "1.2.17"
val slf4jVersion = "1.7.5" 

lazy val sparkDependencies = Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-hive" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion,
  "org.apache.hadoop" % "hadoop-common" % "2.2.0"
)

lazy val otherDependencies = Seq(
  "org.apache.spark" %% "spark-streaming-kinesis-asl" % "2.2.0",
  "org.clapper" %% "grizzled-slf4j" % "1.3.1",
  "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.6.2" % "runtime",
  "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
  "com.typesafe" % "config" % "1.3.1",
  "org.rogach" %% "scallop" % "3.0.3",
  "org.influxdb" % "influxdb-java" % "2.9"
)


libraryDependencies ++= sparkDependencies.map(_ % "provided" ) ++ otherDependencies

dependencyOverrides ++= Set("com.squareup.okio" % "okio" % "1.13.0")

Using the same jar I can run a succesful test to instantiate an InfluxDb instance in a non-spark job. But trying to do some from Spark throws the above error. Sounds like spark must have it's own version of OKIO that's causing this conflict at run when I use spark-submit. ... But it doesn't show that when I dump the dependency tree. Any advice on how I can bring my desired version of okio 1.13.0 to the spark cluster run path?

(as I'm typing I'm thinking to try shading which I will do now) Thanks

In my case "using Apache Spark 1.6.3 with Hadoop HDP distribution"

  1. I run spark-shell and see on web UI what jar are used
  2. Search okhttp jar tf /usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar | grep okhttp jar tf /usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar | grep okhttp
  3. Extract okhttp version jar xf /usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar META-INF/maven/com.squareup.okhttp/okhttp/pom.xml

=> version 2.4.0

No idea who is provided this version.

I had the same problem on spark 2.1.0.

Solution: I have downgraded the influxdb-java dependency from version 2.11 (2.12 has empty child dependency and we have problems at fat jar assembling) to 2.1.

Influxdb-java 2.1 have a different API, but it works on spark-submit applications.

if you try to use InfluxDBResultMapper for retrieving data from InfluxDB from Spark application, you should try the first upgrade version:

        <dependency>
            <groupId>org.influxdb</groupId>
            <artifactId>influxdb-java</artifactId>
            <version>2.7</version>
        </dependency>

I fixed!

I know this is an older question, but I just dealt with this for 2 days. I came across this question, but the current answers did not help me. I use maven in my project and we build an uber jar.

In order to get around this, I had to add a "relocation" to the shade-plugin configuration.

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
    <execution>
        <phase>package</phase>
        <goals>
            <goal>shade</goal>
        </goals>
    </execution>
</executions>
<configuration>
    <relocations>
        <relocation>
            <pattern>okio</pattern>
            <shadedPattern>com.shaded.okio</shadedPattern>
        </relocation>
    </relocations>
    <filters>
        <filter>
            <artifact>*:*</artifact>
            <excludes>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
            </excludes>
        </filter>
    </filters>
</configuration>

Based on my understanding com.squareup.okhttp moved to com.squareup.okhttp3 ( https://mvnrepository.com/artifact/com.squareup.okhttp/okhttp )

However, the spark libraries use the older version (okhttp). This also did not show up in the maven dependency tree script. Since the spark cluster already has the spark jars on the cluster, than the okio dependency was somehow being replaced with an older version, and so the newer function could not be found.

Reference (found solution here): https://community.cloudera.com/t5/Support-Questions/How-to-provide-a-different-dependency-for-RDD-in-spark/td-p/189387

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM