简体   繁体   中英

How to extract words starting with letters from A to L from a text in Python?

Language: Python

So I extracted named entities from a text without using external libraries. Then I wanted to create two lists, one for the names starting with AL; the other for the names starting with MZ. I wrote this code, but it did not return any output. Anyone sees what's wrong here? Note that I am a beginner in coding in general and might not be familiar with the terminology.

AL = []
MZ = []
for match in matches:
    if ((match[0] >= "A") and (match[0] <= "L")):
        AL.append(match)
        print("Words between A-L are:  ", AL)
    elif ((match[0] >= "M" and match[0] <= "Z")):
        MZ.append(match)
        print("Words between M-Z are: ", MZ)

edit:

I was asked what the "matches" came from:

pattern = re.compile(r'\.?\s\b[A-Z][a-z]\w+')
matches = pattern.findall(text) 


# print(matches)

And here is the output list in which I am trying to sort the names between AL and MZ:

matches = ['Rossini', 'William Tell', 'America', 'Athabaskan', 'Mackenzie River', 'Morse', 'Trappist', 'Plains', 'India']

And what I meant with "It does not work" is that it returns empty brackets.

Names between A and L:  []
Names between M and Z:  []

Thank you all for your contributions.

So look at the following example:

def filter_words(words):
    AL = []
    MZ = []
    unclassified = []
    for word in words:
        if ((word[0] >= "A") and (word[0] <= "L")):
            AL.append(word)
        elif ((word[0] >= "M" and word[0] <= "Z")):
            MZ.append(word)
        else:
            unclassified.append(word)
    return AL, MZ, unclassified
    

AL, MZ, unclassified = filter_words(["Al", "Bob", "Todd", "Zack", "todd", "zack"])

print(AL)
print(MZ)
print(unclassified)

OUTPUT

['Al', 'Bob']
['Todd', 'Zack']
['todd', 'zack']

Depending on your requirements, you may want to call word.upper() before processing the if statements given that - as you can see - if a name starts with a lower case will be unclassified

I guess the main problem you may have is that you are checking for upper case values and if match[0] is in lower case it won't work

Also you are printing at every iteration you should wait for the entire loop to run and then print. Here

AL= []
MZ = []
for match in matches:
    if ((match[0].lower() >= "a") and (match[0].lower() <= "l")):
        AL.append(match)

    elif ((match[0].lower() >= "m" and match[0].lower() <= "z")):
        MZ.append(match)
        
print("Words between A-L are:  ", AL)
print("Words between M-Z are: ", MZ)

If this still doesn't work please share the match object too. Also your code doesn't account situation when neither of the statements are true.

Here you have to take care of the word case:

matches = ['Arbre', 'Amie', 'Maison', 'Ligne', 'Zebre', 'Maths']

AL = []
MZ = []
for match in matches:
    match = match.upper()
    if ((match[0] >= "A") and (match[0] <= "L")):
        AL.append(match)
    elif ((match[0] >= "M" and match[0] <= "Z")):
        MZ.append(match)
        
print("Words between A-L are:  ", AL)
print("Words between M-Z are: ", MZ)

output is:

Words between A-L are:   ['ARBRE', 'AMIE', 'LIGNE']
Words between M-Z are:  ['MAISON', 'ZEBRE', 'MATHS']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM