[英]Find all permutations of values in Spark RDD; python
I have a spark RDD (myData) that has been mapped as a list. 我有一个已映射为列表的spark RDD(myData)。 The output of myData.collect() yields the following:
myData.collect()的输出产生以下内容:
['x', 'y', 'z']
What operation can I perform on myData to map to or create a new RDD containing a list of all permutations of xyz? 我可以对myData执行哪些操作以映射到或创建一个包含所有xyz排列列表的新RDD? For example newData.collect() would output:
例如newData.collect()将输出:
['xyz', 'xzy', 'zxy', 'zyx', 'yxz', 'yzx']
I've tried using variations of cartesian(myData), but as far as I can tell, the best that gives is different combinations of two-value pairs. 我试过使用笛卡尔(myData)的变体,但是据我所知,给出的最好结果是二值对的不同组合。
>>> from itertools import permutations
>>> t = ['x', 'y', 'z']
>>> ["".join(item) for item in permutations(t)]
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']
Note:
RDD object
can be converted to iterables using toLocalIterator注意:可以使用toLocalIterator将
RDD object
转换为可迭代对象
Doing this all in pyspark
. 在
pyspark
所有pyspark
。 You can use rdd.cartesian
but you have filter out repeats and do it twice (not saying this is good!!!): 您可以使用
rdd.cartesian
但您可以过滤掉重复项并重复两次(不是说这很好!!):
>>> rdd1 = rdd.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
>>> rdd1.collect()
['xy', 'xz', 'yx', 'yz', 'zx', 'zy']
>>> rdd2 = rdd1.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
>>> rdd2.collect()
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.