I have a list of various combos of items in tuples
example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]
I wish to group and count by unique combinations
yielding the result
result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]
It is not important that the order is maintained or which permutation of the combination is preserved but it is very important that operation be done with a lambda function and the output format be still a list of tuples as above because I will be working with a spark RDD object
My code currently counts patterns taken from a data set using
RDD = sc.parallelize(example) result = RDD.map(lambda(y):(y, 1))\\ .reduceByKey(add)\\ .collect() print result
I need another .map
command that will add account for different permutations as explained above
You can use an OrderedDict
to crate an ordered dictionary based on sorted case of its items :
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i in example:
... d.setdefault(tuple(sorted(i)),i)
...
('a', 'b')
('a', 'a', 'a')
('a', 'a')
('a', 'b')
('c', 'd')
('b', 'c', 'a')
('b', 'c', 'a')
>>> d
OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))])
>>> d.values()
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
How about this: maintain a set that contains the sorted form of each item you've already seen. Only add an item to the result list if you haven't seen its sorted form already.
example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
result = []
seen = set()
for item in example:
sorted_form = tuple(sorted(item))
if sorted_form not in seen:
result.append(item)
seen.add(sorted_form)
print result
Result:
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
Since you are looking for a lambda function, try the following:
lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
You can use this lambda function like so:
uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
result = uniquify(example)
Obviously, this sacrifices readability over the other answers. It is basically doing the same thing as Kasramvd's answer, in a single ugly line.
This is similar as the sorted dict.
from itertools import groupby
ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)]
f = lambda x: tuple(sorted(x)) as key
[tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)]
The nice thing is that you can get which are tuples are of the same combination:
In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))):
print k, list(grpr)
....:
('a', 'a') [('a', 'a')]
('a', 'a', 'a') [('a', 'a', 'a')]
('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')]
('a', 'b') [('a', 'b'), ('b', 'a')]
('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')]
('c', 'd') [('c', 'd')]
What you actually seem to need based on the comments, is map-reduce. I don't have Spark installed, but according to the docs (see transformations ) this must be like this:
data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i)
This however will return (b, a)
if your dataset has (a, b), (b, a)
in that order.
I solved my own problem, but it was difficult to understand what I was really looking for I used
example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)]
RDD = sc.parallelize(example)
result = RDD.map(lambda x: list(set(x)))\
.filter(lambda x: len(x)>1)\
.map(lambda(x):(tuple(x), 1))\
.reduceByKey(add)\
.collect()
print result
which also eliminated simply repeated values such as (1,1) and (1,1,1) which was of added benefit to me
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.