从PySpark向Redis写入数据

Question

In Scala, we would write an RDD to Redis like this: 在Scala中，我们会像这样写一个RDD给Redis：

datardd.foreachPartition(iter => {
      val r = new RedisClient("hosturl", 6379)
      iter.foreach(i => {
        val (str, it) = i
        val map = it.toMap
        r.hmset(str, map)
      })
    })

I tried doing this in PySpark like this: datardd.foreachPartition(storeToRedis) , where function storeToRedis is defined as: 我尝试在PySpark中这样做： datardd.foreachPartition(storeToRedis) ，其中函数storeToRedis定义为：

def storeToRedis(x):
    r = redis.StrictRedis(host = 'hosturl', port = 6379)
    for i in x:
        r.set(i[0], dict(i[1]))

It gives me this: 它给了我这个：

ImportError: ('No module named redis', function subimport at 0x47879b0, ('redis',)) ImportError：（'没有名为redis的模块'，函数subimport在0x47879b0，（'redis'，））

Of course, I have imported redis. 当然，我已经进口了redis。

Answer 1

PySpark's SparkContext has a addPyFile method specifically for this thing. PySpark的SparkContext有一个专门用于此事的addPyFile方法。 Make the redis module a zip file ( like this ) and just call this method: 将redis模块设为zip文件（如下所示）并调用此方法：

sc = SparkContext(appName = "analyze")
sc.addPyFile("/path/to/redis.zip")

从PySpark向Redis写入数据

问题描述

1 个解决方案

解决方案1
5 已采纳 2015-08-30 07:36:36

从PySpark向Redis写入数据

问题描述

1 个解决方案

解决方案1 5 已采纳 2015-08-30 07:36:36

解决方案1
5 已采纳 2015-08-30 07:36:36