简体   繁体   中英

pythonic way to filter list for elements with unique length

I want to filter a list, leaving only first elements with unique length. I wrote a function for it, but I believe there should be a simpler way of doing it:

def uniq_len(_list):
    from itertools import groupby
    uniq_lens = list(set([x for x, g in groupby(_list, len)]))
    all_goods = []
    for elem in _list:
        elem_len = len(elem)
        try:
            good = uniq_lens.pop([i for i, x in enumerate(uniq_lens) if x==elem_len][0])
            if good:
                all_goods.append(elem)
        except IndexError as _e:
            #print all_goods
            pass
    return all_goods

In [97]: jones
Out[97]: ['bob', 'james', 'jim', 'jon', 'bill', 'susie', 'jamie']

In [98]: uniq_len(jones)
Out[98]: ['bob', 'james', 'bill']

If you just want any arbitrary string for each length, in arbitrary order, the easy way to do this is to first convert to a dict mapping lengths to strings, then just read off the values:

>>> {len(s): s for s in jones}.values()
dict_values(['jon', 'bill', 'jamie'])

If you want the first for each length, and you need to preserve the order, then that's just unique_everseen from the itertools recipes , with len as the key:

>>> from more_itertools import unique_everseen
>>> list(unique_everseen(lst, key=len))
['bob', 'james', 'bill']

(If you pip install more-itertools , it includes all of the recipes from the itertools docs, plus a bunch of other helpful things.)

Getting the first item of the list with unique length (not necessarily in the same order as they appear in the list).

>>> lst = ['bob', 'james', 'jim', 'jon', 'bill', 'susie', 'jamie']
>>> list({len(x): x for x in reversed(lst)}.values())
['bob', 'bill', 'james']

Respecting the order of the original list, you can use an auxiliary set:

>>> seen = set()
>>> [x for x in lst if len(x) not in seen and seen.add(len(x)) is None]
['bob', 'james', 'bill']

For the above expression to work properly in succession, you have to make sure you reset seen to an empty set each time.

A not very elegant way would be:

>>> mylist = ['bob', 'james', 'jim', 'jon', 'bill', 'susie', 'jamie']
>>> filtered = []
>>> [filtered.append(x) for x in mylist if len(x) not in [len(y) for y in filtered]]
[None, None, None]
>>> print(filtered)
['bob', 'james', 'bill']

As you can see the, interpreter prints [None, None, None] because the line where we append to filtered actually produces a list of None values (the append method always returns None), which is then discarded. But that line has the side effect of populating filtered with the right values.

Simple way, using just built-ins:

reduce(
         lambda o1, o2: o1 if o1 and len(o1[-1]) == len(o2) else o1 + [o2], 
         sorted(
                  orig, 
                  key=lambda o: len(o)
         ), 
         []
)

This will give you O(n * log(n)) complexity.

As the sorted is stable, the ordering between equal-length strings will be the same as it was before sorting. Then the reduce function will leave only the first occurrence from each length.

List comprehensions are a good way to make your code more pythonic. Here's a good explanation of how they work: List Comprehensions.

So an example for how to do the above might be something like:

from itertools import groupby

def filterUniqueLenghts(myList):
    lengths = {k:len(list(v)) for k,v in groupby(myList, lambda a: len(a))}
    return [e for e in myList if lengths[len(e)] == 1]

a = ['hello', 'hello', 'goodbye']
print(filterUniqueLenghts(a))

# prints ['goodbye']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM