I'm brandnew to spark world, I am very close to make a succesful connection with my database. I can connect to my datastax remote database but the problem comes when I want to retrieve the information from it, I use:
data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()
As the problem is that I don't have the cassandra connector, then the solution would be :
pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1
But for some reason I get the error: Java gateway process exited before sending its port number
I have seen some other recommendations of similar questions, I show here below the things I have tried but with the same result, the error: Java gateway process exited before sending its port number
So, my original code is:
import os
from pyspark import SparkContext,SparkFiles,SQLContext,SparkFiles
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.functions import col
secure_bundle_file=os.getcwd()+'\\secure-connect-dbtest.zip'
sparkSession = SparkSession.builder.appName('SparkCassandraApp').config('spark.cassandra.connection.config.cloud.path',secure_bundle_file).config('spark.cassandra.auth.username', 'test').config('spark.cassandra.auth.password','testquart').config('spark.dse.continuousPagingEnabled',False).master('local[2]').getOrCreate()
#Until here is fine, the "reading" is failing
data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()
With this, the error is : Java gateway process exited before sending its port number
I added the next recommended solutions to my code:
Solution 1 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master spark://127.0.0.1 pyspark'
Solution 2 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master local[2] pyspark'
None of them worked for me, I got the same error. Aside this, I got JAVA_HOME,SPARK_HOME,HADOOP_HOME properly set in my system.
So, what's going on?
Thanks!
I'm brandnew to spark world, I am very close to make a succesful connection with my database. I can connect to my datastax remote database but the problem comes when I want to retrieve the information from it, I use:
data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()
As the problem is that I don't have the cassandra connector, then the solution would be :
pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1
But for some reason I get the error: Java gateway process exited before sending its port number
I have seen some other recommendations of similar questions, I show here below the things I have tried but with the same result, the error: Java gateway process exited before sending its port number
So, my original code is:
import os
from pyspark import SparkContext,SparkFiles,SQLContext,SparkFiles
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.functions import col
secure_bundle_file=os.getcwd()+'\\secure-connect-dbtest.zip'
sparkSession = SparkSession.builder.appName('SparkCassandraApp').config('spark.cassandra.connection.config.cloud.path',secure_bundle_file).config('spark.cassandra.auth.username', 'test').config('spark.cassandra.auth.password','testquart').config('spark.dse.continuousPagingEnabled',False).master('local[2]').getOrCreate()
#Until here is fine, the "reading" is failing
data = sparkSession.read.format("org.apache.spark.sql.cassandra").options(table="tbthesis", keyspace="test").load()
With this, the error is : Java gateway process exited before sending its port number
I added the next recommended solutions to my code:
Solution 1 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master spark://127.0.0.1 pyspark'
Solution 2 : environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.1 --master local[2] pyspark'
None of them worked for me, I got the same error. Aside this, I got JAVA_HOME,SPARK_HOME,HADOOP_HOME properly set in my system.
So, what's going on?
Thanks!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.