简体   繁体   中英

Unique Combinations in a list of k,v tuples in Python

I have a list of various combos of items in tuples

example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]

I wish to group and count by unique combinations

yielding the result

result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]

It is not important that the order is maintained or which permutation of the combination is preserved but it is very important that operation be done with a lambda function and the output format be still a list of tuples as above because I will be working with a spark RDD object

My code currently counts patterns taken from a data set using

RDD = sc.parallelize(example) result = RDD.map(lambda(y):(y, 1))\\ .reduceByKey(add)\\ .collect() print result

I need another .map command that will add account for different permutations as explained above

You can use an OrderedDict to crate an ordered dictionary based on sorted case of its items :

>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i in example:
...   d.setdefault(tuple(sorted(i)),i)
... 
('a', 'b')
('a', 'a', 'a')
('a', 'a')
('a', 'b')
('c', 'd')
('b', 'c', 'a')
('b', 'c', 'a')
>>> d
OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))])
>>> d.values()
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]

How about this: maintain a set that contains the sorted form of each item you've already seen. Only add an item to the result list if you haven't seen its sorted form already.

example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
result = []
seen = set()
for item in example:
    sorted_form = tuple(sorted(item))
    if sorted_form not in seen:
        result.append(item)
        seen.add(sorted_form)
print result

Result:

[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]

Since you are looking for a lambda function, try the following:

lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()

You can use this lambda function like so:

uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
result = uniquify(example)

Obviously, this sacrifices readability over the other answers. It is basically doing the same thing as Kasramvd's answer, in a single ugly line.

This is similar as the sorted dict.

from itertools import groupby
ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)]
f = lambda x: tuple(sorted(x)) as key
[tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)]

The nice thing is that you can get which are tuples are of the same combination:

In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))):
    print k, list(grpr)
   ....:     
('a', 'a') [('a', 'a')]
('a', 'a', 'a') [('a', 'a', 'a')]
('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')]
('a', 'b') [('a', 'b'), ('b', 'a')]
('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')]
('c', 'd') [('c', 'd')]

What you actually seem to need based on the comments, is map-reduce. I don't have Spark installed, but according to the docs (see transformations ) this must be like this:

data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i)

This however will return (b, a) if your dataset has (a, b), (b, a) in that order.

I solved my own problem, but it was difficult to understand what I was really looking for I used

example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)]
RDD = sc.parallelize(example)
result = RDD.map(lambda x: list(set(x)))\
            .filter(lambda x: len(x)>1)\
            .map(lambda(x):(tuple(x), 1))\
            .reduceByKey(add)\
            .collect()
print result

which also eliminated simply repeated values such as (1,1) and (1,1,1) which was of added benefit to me

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM