[英]Writing to DSE graph from EMR
We are trying to write to write to a DSE graph (cassandra) from EMR and keep getting these errors. 我们正在尝试写入从EMR写入DSE图(cassandra)并继续获取这些错误。 My JAR is a shaded jar with the byos dependencies.
我的JAR是一个带有byos依赖关系的阴影jar。 Any help would be appreciated.
任何帮助,将不胜感激。
java.lang.UnsatisfiedLinkError: org.apache.cassandra.utils.NativeLibraryLinux.getpid()J
at org.apache.cassandra.utils.NativeLibraryLinux.getpid(Native Method)
at org.apache.cassandra.utils.NativeLibraryLinux.callGetpid(NativeLibraryLinux.java:124)
at org.apache.cassandra.utils.NativeLibrary.getProcessID(NativeLibrary.java:429)
at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:386)
at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:367)
at org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:300)
at org.apache.cassandra.utils.UUIDGen.<clinit>(UUIDGen.java:41)
at com.datastax.bdp.graph.spark.sql.vertex.SimpleVertexIdAssigner$.simpleEdgeId(SimpleVertexIdAssigner.scala:19)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:417)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:416)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:131)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:298)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/04/26 12:55:49 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 18, ip-10-69-16-79.vpc.internal, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.UUIDGen
at com.datastax.bdp.graph.spark.sql.vertex.SimpleVertexIdAssigner$.simpleEdgeId(SimpleVertexIdAssigner.scala:19)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:417)
at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:416)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
Usually such errors happen when temporary directory is mounted with noexec
attribute that prevents loading of the native library that is used by java driver. 通常在使用
noexec
属性挂载临时目录时会发生此类错误,该属性会阻止加载java驱动程序使用的本机库。 Usual workaround to point Java to another location for temporary files with -Djava.io.tmpdir=...
flag - this location shouldn't be mounted with noexec
flag. 通常的解决方法是使用
-Djava.io.tmpdir=...
flag将Java指向另一个临时文件的位置 - 此位置不应使用noexec
标志挂载。
PS Unfortunately I don't know much about EMR PS不幸的是我对EMR了解不多
Turned out to be a JNA issue. 原来是JNA问题。 Added the JNA dependency as a part of the shaded jar and it worked.
添加了JNA依赖项作为着色jar的一部分,它工作正常。
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>4.2.2</version>
</dependency>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.