查找Spark RDD中值的所有排列；蟒蛇

Question

I have a spark RDD (myData) that has been mapped as a list. 我有一个已映射为列表的spark RDD（myData）。 The output of myData.collect() yields the following: myData.collect（）的输出产生以下内容：

['x', 'y', 'z']

What operation can I perform on myData to map to or create a new RDD containing a list of all permutations of xyz? 我可以对myData执行哪些操作以映射到或创建一个包含所有xyz排列列表的新RDD？ For example newData.collect() would output: 例如newData.collect（）将输出：

['xyz', 'xzy', 'zxy', 'zyx', 'yxz', 'yzx']

I've tried using variations of cartesian(myData), but as far as I can tell, the best that gives is different combinations of two-value pairs. 我试过使用笛卡尔（myData）的变体，但是据我所知，给出的最好结果是二值对的不同组合。

Answer 1

>>> from itertools import permutations
>>> t = ['x', 'y', 'z']
>>> ["".join(item) for item in permutations(t)]

['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

Note: RDD object can be converted to iterables using toLocalIterator 注意：可以使用toLocalIterator将RDD object转换为可迭代对象

Answer 2

Doing this all in pyspark . 在pyspark所有pyspark 。 You can use rdd.cartesian but you have filter out repeats and do it twice (not saying this is good!!!): 您可以使用rdd.cartesian但您可以过滤掉重复项并重复两次（不是说这很好！！）：

 >>> rdd1 = rdd.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd1.collect()
 ['xy', 'xz', 'yx', 'yz', 'zx', 'zy']
 >>> rdd2 = rdd1.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd2.collect()
 ['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

查找Spark RDD中值的所有排列；蟒蛇

问题描述

2 个解决方案

解决方案1
0 2017-04-30 05:02:18

解决方案2
0 2017-04-30 05:56:30

查找Spark RDD中值的所有排列； 蟒蛇

问题描述

2 个解决方案

解决方案1 0 2017-04-30 05:02:18

解决方案2 0 2017-04-30 05:56:30

查找Spark RDD中值的所有排列；蟒蛇

解决方案1
0 2017-04-30 05:02:18

解决方案2
0 2017-04-30 05:56:30