What would be the most pythonic way of achieving the transformation from the following input:
input = [('a', 1), ('a', 10), ('b', 244), ('c', 31) , ('c',45)]
to the desired output:
output = [[('a',1),('a',10)],[('c',31),('c',45)]]
where I have grouped in lists the tuples which have the same first element.
Feeling that Python has a strong potential (I'm new with it) in writing complicated things on one line, I have decided to use comprehensive lists. My initial try is something like:
output = [x for x in input if [k[0] for f in input].count(x[0])>1]
giving me a nice list of all my "pseudo" duplicates :
output = [('a',1),('a',10),('c',31),('c',45)]
which I further process to obtain my result.
My question is: is there a way to achieve this result in one line using comprehensive lists instead of two (ugly) steps?
Use groupby
from itertools
and list comprehension. This will give you a simple one liner:
from itertools import groupby
filter(lambda x: len(x)>1, [list(g) for i,g in groupby(input, key=lambda x: x[0])])
[[('a', 1), ('a', 10)], [('c', 31), ('c', 45)]]
Using a 1-liner list comprehension:
>>> L=[('a', 1), ('a', 10), ('b', 244), ('c', 31) , ('c',45)]
>>> [list(filter(lambda x:x[0]==i, L)) for i in set(map(lambda x:x[0], L)) if len(list(filter(lambda x:x[0]==i, L)))>1]
[[('a', 1), ('a', 10)], [('c', 31), ('c', 45)]]
Use itertools.groupby
. My solution is not one-liner, but more readable.
import itertools
lists_in = [('a', 1), ('a', 10), ('b', 244), ('c', 31) , ('c',45)]
lists_out = list()
for name, group in itertools.groupby(lists_in, key=lambda x:x[0]):
l = list(group)
if len(l) == 2:
lists_out.extend(l)
print(lists_out)
# Output
[('a', 1), ('a', 10), ('c', 31), ('c', 45)]
There is nothing wrong with the following:
input = [('a', 1), ('a', 10), ('b', 244), ('c', 31) , ('c',45)]
d = {}
for i in input:
if i[0] in d:
d[i[0]].append(i)
else:
d[i[0]] = [i]
print([d[k] for k in d if len(d[k]) > 1])
Don't forget, you have to mantain a balance between readability and cleverness.
Later edit : I actually gathered the other solutions from other answers and measured time execution (200000 uniformly distributed tuples with 'a'-'z' first element), see below:
# 0.048532 s
def foo(input):
d = {}
for i in input:
if i[0] in d:
d[i[0]].append(i)
else:
d[i[0]] = [i]
return len(([d[k] for k in d if len(d[k]) > 1]))
# 1.9594 s
def foo2(input):
[list(filter(lambda x:x[0]==i, input)) for i in set(map(lambda x:x[0], input)) if len(list(filter(lambda x:x[0]==i, input)))>1]
# 0.209639 s
def foo3(input):
[filter(lambda x: len(x)>1, [list(g) for i,g in itertools.groupby(input, key=lambda x: x[0])])]
# 0.188625
def foo4(input):
lists = list()
for name, group in itertools.groupby(input, key=lambda x: x[0]):
l = list(group)
if len(l) == 2:
lists.extend(l)
# didn't even finish, >120 s
def foo5(input_list):
[[x for x in input_list if x[0]==a] for a in {x[0] for x in input_list if [k[0] for k in input].count(x[0])>1}]
So yes, more clever one-line solutions, but slower and harder to read are not really the "most pythonic".
Here is one solution:
>>> input_list = [('a', 1), ('a', 10), ('b', 244), ('c', 31) , ('c',45)]
>>> [[x for x in input_list if x[0]==a] for a in {x[0] for x in input_list if [k[0] for k in input].count(x[0])>1}]
will print
>>> [[('a', 1), ('a', 10)], [('c', 31), ('c', 45)]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.