简体   繁体   中英

Problem with package com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 in pyspark

I'm brandnew to spark world, I am very close to make a succesful connection with my database. I can connect to my datastax remote database but the problem comes when I want to retrieve the information from it, I use:

data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()

As the problem is that I don't have the cassandra connector, then the solution would be :

pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1

But for some reason I get the error: Java gateway process exited before sending its port number

I have seen some other recommendations of similar questions, I show here below the things I have tried but with the same result, the error: Java gateway process exited before sending its port number

So, my original code is:

import os
from pyspark import SparkContext,SparkFiles,SQLContext,SparkFiles
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.functions import col

 secure_bundle_file=os.getcwd()+'\\secure-connect-dbtest.zip'
    sparkSession = SparkSession.builder.appName('SparkCassandraApp').config('spark.cassandra.connection.config.cloud.path',secure_bundle_file).config('spark.cassandra.auth.username', 'test').config('spark.cassandra.auth.password','testquart').config('spark.dse.continuousPagingEnabled',False).master('local[2]').getOrCreate()
    #Until here is fine, the "reading" is failing
    data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()

With this, the error is : Java gateway process exited before sending its port number

I added the next recommended solutions to my code:

Solution 1 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master spark://127.0.0.1  pyspark'

Solution 2 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master local[2]  pyspark'

None of them worked for me, I got the same error. Aside this, I got JAVA_HOME,SPARK_HOME,HADOOP_HOME properly set in my system.

So, what's going on?

Thanks!

I'm brandnew to spark world, I am very close to make a succesful connection with my database. I can connect to my datastax remote database but the problem comes when I want to retrieve the information from it, I use:

data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()

As the problem is that I don't have the cassandra connector, then the solution would be :

pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1

But for some reason I get the error: Java gateway process exited before sending its port number

I have seen some other recommendations of similar questions, I show here below the things I have tried but with the same result, the error: Java gateway process exited before sending its port number

So, my original code is:

import os
from pyspark import SparkContext,SparkFiles,SQLContext,SparkFiles
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.functions import col

 secure_bundle_file=os.getcwd()+'\\secure-connect-dbtest.zip'
    sparkSession = SparkSession.builder.appName('SparkCassandraApp').config('spark.cassandra.connection.config.cloud.path',secure_bundle_file).config('spark.cassandra.auth.username', 'test').config('spark.cassandra.auth.password','testquart').config('spark.dse.continuousPagingEnabled',False).master('local[2]').getOrCreate()
    #Until here is fine, the "reading" is failing
    data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()

With this, the error is : Java gateway process exited before sending its port number

I added the next recommended solutions to my code:

Solution 1 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master spark://127.0.0.1  pyspark'

Solution 2 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master local[2]  pyspark'

None of them worked for me, I got the same error. Aside this, I got JAVA_HOME,SPARK_HOME,HADOOP_HOME properly set in my system.

So, what's going on?

Thanks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM