简体   繁体   English

查找Spark RDD中值的所有排列; 蟒蛇

[英]Find all permutations of values in Spark RDD; python

I have a spark RDD (myData) that has been mapped as a list. 我有一个已映射为列表的spark RDD(myData)。 The output of myData.collect() yields the following: myData.collect()的输出产生以下内容:

['x', 'y', 'z']

What operation can I perform on myData to map to or create a new RDD containing a list of all permutations of xyz? 我可以对myData执行哪些操作以映射到或创建一个包含所有xyz排列列表的新RDD? For example newData.collect() would output: 例如newData.collect()将输出:

['xyz', 'xzy', 'zxy', 'zyx', 'yxz', 'yzx']

I've tried using variations of cartesian(myData), but as far as I can tell, the best that gives is different combinations of two-value pairs. 我试过使用笛卡尔(myData)的变体,但是据我所知,给出的最好结果是二值对的不同组合。

>>> from itertools import permutations
>>> t = ['x', 'y', 'z']
>>> ["".join(item) for item in permutations(t)]

['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

Note: RDD object can be converted to iterables using toLocalIterator 注意:可以使用toLocalIteratorRDD object转换为可迭代对象

Doing this all in pyspark . pyspark所有pyspark You can use rdd.cartesian but you have filter out repeats and do it twice (not saying this is good!!!): 您可以使用rdd.cartesian但您可以过滤掉重复项并重复两次(不是说这很好!!):

 >>> rdd1 = rdd.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd1.collect()
 ['xy', 'xz', 'yx', 'yz', 'zx', 'zy']
 >>> rdd2 = rdd1.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd2.collect()
 ['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM