简体   繁体   English

Spark java.lang.NoSuchMethodError

[英]Spark java.lang.NoSuchMethodError

I ran the following udf using scipy cosine similarity on Spark on YARN. 我使用Spark on YARN上的scipy余弦相似度运行以下udf。 I first tested this on a sample 30 observations of the data. 我首先在样本30的数据观察上测试了这一点。 and it ran fine and creates a cosine similarity matrix in 5 sec. 它运行良好,并在5秒内创建一个余弦相似度矩阵。

here is the code: 这是代码:

def cosineSimilarity(df):
    """ Cosine similarity of the each document with other

    """

    from pyspark.sql.functions import udf
    from pyspark.sql.types import DoubleType
    from scipy.spatial import distance

    cosine = udf(lambda v1, v2: (
     float(1-distance.cosine(v1, v2)) if v1 is not None and v2 is not None else None),
     DoubleType())

    # Creating a cross product of the table to get the cosine similarity vectors 

    crosstabDF=df.withColumnRenamed('id','id_1').withColumnRenamed('w2v_vector','w2v_vector_1')\
    .join(df.withColumnRenamed('id','id_2').withColumnRenamed('w2v_vector','w2v_vector_2'))

    similardocs_df= crosstabDF.withColumn('cosinesim', cosine("w2v_vector_1","w2v_vector_2"))

    return similardocs_df

#similardocs_df=cosineSimilarity(w2vdf.select('id','w2v_vector'))


similardocs_df=cosineSimilarity(w2vdf_sample.select('id','w2v_vector'))

Then I tried to pass the whole matrix (58K records) and it runs for a while and then is giving me the following error: 然后我试图传递整个矩阵(58K记录)并运行一段时间,然后给我以下错误:

I would like to mention , one time it did run for the whole data within 5 minutes. 我想提一下,有一次它确实在5分钟内运行了整个数据。 But now on the whole data it is giving me this error while it runs on sample with no issues. 但现在在整个数据上它给我这个错误,而它运行在样本上没有问题。

WARN  org.spark_project.jetty.servlet.ServletHandler (ServletHandler.java:doHandle(667)) - Error for /jobs/
java.lang.NoSuchMethodError: javax.servlet.http.HttpServletRequest.getDispatcherType()Ljavax/servlet/DispatcherType;
    at org.spark_project.jetty.servlets.gzip.AbstractCompressedStream.doCompress(AbstractCompressedStream.java:248)
    at org.spark_project.jetty.servlets.gzip.AbstractCompressedStream.checkOut(AbstractCompressedStream.java:354)
    at org.spark_project.jetty.servlets.gzip.AbstractCompressedStream.write(AbstractCompressedStream.java:229)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:135)
    at java.io.OutputStreamWriter.write(OutputStreamWriter.java:220)
    at java.io.PrintWriter.write(PrintWriter.java:456)
    at java.io.PrintWriter.write(PrintWriter.java:473)
    at java.io.PrintWriter.print(PrintWriter.java:603)
    at org.apache.spark.ui.JettyUtils$$anon$2.doGet(JettyUtils.scala:86)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
    at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
    at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
    at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:479)
    at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
    at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.spark_project.jetty.server.Server.handle(Server.java:499)
    at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
    at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:744)
2017-02-23 21:01:48,024 WARN  org.spark_project.jetty.server.HttpChannel (HttpChannel.java:handle(384)) - /jobs/

I have also gone through this error in pyspark, i resolved this problem by adding some jars in spark-submit command. 我也在pyspark中遇到了这个错误,我通过在spark-submit命令中添加一些jar解决了这个问题。

--jars /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar --jars /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/lib/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5。 9.0.jar

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$ - java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$ Databricks 连接到 IntelliJ + python 线程“主”java.lang.NoSuchMethodError 中的错误异常: - Databricks connect to IntelliJ + python Error Exception in thread “main” java.lang.NoSuchMethodError: 通过PySpark将实时Kafka数据吸收到HBase中-java.lang.NoSuchMethodError:hbase.client.Put.add([B [B [B)Lorg / apache / hadoop / hbase / client / Put; - Real-Time Kafka Data Ingestion into HBase via PySpark - java.lang.NoSuchMethodError: hbase.client.Put.add([B[B[B)Lorg/apache/hadoop/hbase/client/Put; Spark java.lang.VerifyError - Spark java.lang.VerifyError Spark流:java.lang.OutOfMemoryError:Java堆空间 - Spark Streaming: java.lang.OutOfMemoryError: Java heap space python上的Apache-Spark错误:java.lang.reflect.InaccessibleObjectException - Apache-Spark error on python : java.lang.reflect.InaccessibleObjectException java.lang.IllegalArgumentException 将 Python UDF 应用于 Spark dataframe - java.lang.IllegalArgumentException when applying a Python UDF to a Spark dataframe 引发火花:java.lang.StackOverflowError窗口函数? - Spark Caused by: java.lang.StackOverflowError Window Function? Spark ALS:Java堆空间用尽:java.lang.OutOfMemoryError:Java堆空间 - Spark ALS: Running out of java heap space: java.lang.OutOfMemoryError: Java heap space Spark DF 枢轴错误:方法枢轴([类 java.lang.String,类 java.lang.String])不存在 - Spark DF pivot error: Method pivot([class java.lang.String, class java.lang.String]) does not exist
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM