How to increase speed when writing Spark DataFrame to Redis?

I am developing a book recommendation API based on Flask, and it was found that to manage multiple requests I'll need to pre-calculate similarity matrix and store it somewhere for future queries. This matrix is created using PySpark based on ~1.5 million of database entries with book id, name and metadata, and the result can be described by this schema ( i and j are for book indexes, dot is for similarity of their metadata):


Initially, it was my intention to store it on Redis, using spark-redis connector. However, the following command appears to work with a very slow speed (even if initial book database query size is limited to a very modest 40k batch):

similarities.write.format("org.apache.spark.sql.redis").option("table", "similarities").option("key.column", "i").save()

It took around 6 hours to advance through 3 of the 9 stages Spark separated the initial task into. Strangely, storage memory usage by Spark executors was very low, around 20kb. A typical stage active stage is described as such by Spark Application UI:

sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Is it possible to somehow speed up this process? My Spark session is set up this way:

SUBMIT_ARGS = "  --driver-memory 2G --executor-memory 2G --executor-cores 4 --packages mysql:mysql-connector-java:5.1.39 pyspark-shell"
conf = SparkConf().set("spark.jars", "spark-redis/target/spark-redis_2.11-2.4.3-SNAPSHOT-jar-with-dependencies.jar").set("spark.executor.memory", "4g")
sc = SparkContext('local','example', conf=conf) 
sql_sc = SQLContext(sc)

You may try to use Append save mode to avoid checking if the data already exists in the table:

similarities.write.format("org.apache.spark.sql.redis").option("table", "similarities").mode('append').option("key.column", "i").save()

Also, you may want to change

sc = SparkContext('local','example', conf=conf) 


sc = SparkContext('local[*]','example', conf=conf) 

to utilize all cores on your machine.

BTW, is it correct to use i as a key in Redis? Shouldn't it be a composition of both i and j ?

