简体   繁体   中英

Get list of words all starting with the same letter from multiple lists

I'm working on a side project and I've encountered this problem. Basically, the input I'm dealing with is a list of lists, where the inner lists look something like this:

- ['operating', 'alive', 'effective', 'rapid', 'progressive', 'working', 'mobile']
- ['enjoyable', 'pleasant', 'entertaining', 'amusing', 'lively', 'boisterous', 'convivial', 'merry', 'witty']

There can be any number of inner lists (but I've considered creating a limitation). What I want to achieve is to return lists of words from each of the lists that begin with the same letter. For example, from the above, we'd get something like:

[alive, amusing], [effective, enjoyable], [effective, entertaining], [progressive, pleasant] ...

My question is, what is a good approach? I've considered going through the entire alphabet and using a boolean array to keep track of which letters had a word in each list starting with that letter, but it seems inefficient, and I'm not satisfied with the approach.

For example (not complete, but just for reference..):

d = dict.fromkeys(ascii_lowercase, False)    
for c in ascii_lowercase:
    found = False
    for item in description:
        for syn in item:
           if syn.startswith(c):
               found = True
        d[c] = found

And then just grabbing the words starting with the letters marked 'True' from each list to build the output list.

Am I missing a simpler approach? I'm new to Python, so I'm not sure if I'm missing a built in function that could be helpful in this case.

Thanks for reading!

One option could be to sort a flattened version of your list, then use groupby with a custom key to get the different first letters as groups.

[list(grp) for _,grp in groupby(sorted(chain.from_iterable(li)), key=itemgetter(0))]

Example

>>> from itertools import groupby, chain
>>> from operator import itemgetter

>>> li = [['operating', 'alive', 'effective', 
           'rapid', 'progressive', 'working', 'mobile'], 
          ['enjoyable', 'pleasant', 'entertaining', 'amusing',
           'lively', 'boisterous', 'convivial', 'merry', 'witty']]

>>> [list(grp) for _,grp in 
     groupby(sorted(chain.from_iterable(li)), key=itemgetter(0))]
[['alive', 'amusing'],
 ['boisterous'],
 ['convivial'],
 ['effective', 'enjoyable', 'entertaining'],
 ['lively'],
 ['merry', 'mobile'],
 ['operating'],
 ['pleasant', 'progressive'],
 ['rapid'],
 ['witty', 'working']]

List comprehension will make the job much simpler!

You need to iterate through the first inner list l[0] as i , with that in hand, iterate through every element in second inner list, l[1] as j . If your condition satisfies, then add them to the list!

>>> l
[['operating', 'alive', 'effective', 'rapid', 'progressive', 'working', 'mobile'], ['enjoyable', 'pleasant', 'entertaining', 'amusing', 'lively', 'boisterous', 'convivial', 'merry', 'witty']]

>>> [[i,j] for j in l[1] for i in l[0] if j.startswith(i[0])]
[['effective', 'enjoyable'], ['progressive', 'pleasant'], ['effective', 'entertaining'], ['alive', 'amusing'], ['mobile', 'merry'], ['working', 'witty']]

I'd use a dictionary "char":listOfWords[], and fill it while iterating your lists...

For each list element of all lists:

if dictionary contains the "char" with whom the element starts with 

you add the element to the list of the key "char"

else 

you create the new element in the dictionary with the new starting char, initialize his list and add the element to the new list.

The resulting dictionary will be something like:

"a":[alive, amusing],"b":[boisterous],"c":[convivial], ...

Use a dictionary that maps each letter to a list of words. This is some sample code:

from collections import defaultdict

letterWordsDict = defaultdict(lambda: [])

# Let ls contain sub-lists of words.
for subls in ls:
    for word in subls:
        letterWordsDict[word[0]].append(word)

groupedWords = letterWordsDict.values()

If you want to list the words that starts with same character, you can use the following snippet.

Python 3 (Assumed you have only lowercase letters) :

import string 

outer = [
    ['operating', 'alive', 'effective', 'rapid', 'progressive', 'working', 'mobile'],
    ['enjoyable', 'pleasant', 'entertaining', 'amusing', 'lively', 'boisterous', 'convivial', 'merry', 'witty']
]

lowercase = string.ascii_lowercase
data = {lowercase[i]:[] for i in range(26)}
for inner in outer:
    for word in inner:
        data[word[0]].append(word)

flat_list = []
for character in sorted(data.keys()):
    if len(data[character])!=0:
        flat_list.append(sorted(data[character]))

print(flat_list)

Output:

[['alive', 'amusing'], ['boisterous'], ['convivial'], ['effective', 'enjoyable', 'entertaining'], ['lively'], ['merry', 'mobile'], ['operating'], ['pleasant', 'progressive'], ['rapid'], ['witty', 'working']]

I flattened the list of lists first, then i sorted by the first letter to group by that key, finally i've extracted the group value into a list then wrapped the whole into a list as a result.

>>> from operator import itemgetter
>>> from itertools import chain

>>> items = [['operating', 'alive', 'effective', 'rapid', 'progressive', 'working', 'mobile'], ['enjoyable', 'pleasant', 'entertaining', 'amusing', 'lively', 'boisterous', 'convivial', 'merry', 'witty']]


>>> first_item = itemgetter (0)

>>> flattened_items = chain.from_iterable (items)

>>> list (list (gitems) for _, gitems in groupby (sorted (flattened_items, key = first_item), key = first_item))

[['alive', 'amusing'], ['boisterous'], ['convivial'], ['effective', 'enjoyable', 'entertaining'], ['lively'], ['mobile', 'merry'], ['operating'], ['progressive', 'pleasant'], ['rapid'], ['working', 'witty']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM