简体   繁体   English

如何从Python的文本中提取A到L字母开头的单词?

[英]How to extract words starting with letters from A to L from a text in Python?

Language: Python语言:Python

So I extracted named entities from a text without using external libraries.因此,我在不使用外部库的情况下从文本中提取命名实体。 Then I wanted to create two lists, one for the names starting with AL;然后我想创建两个列表,一个用于以 AL 开头的名称; the other for the names starting with MZ.另一个以 MZ 开头的名称。 I wrote this code, but it did not return any output. Anyone sees what's wrong here?我写了这段代码,但它没有返回任何 output。有人看到这里出了什么问题吗? Note that I am a beginner in coding in general and might not be familiar with the terminology.请注意,我是一般编码的初学者,可能不熟悉这些术语。

AL = []
MZ = []
for match in matches:
    if ((match[0] >= "A") and (match[0] <= "L")):
        AL.append(match)
        print("Words between A-L are:  ", AL)
    elif ((match[0] >= "M" and match[0] <= "Z")):
        MZ.append(match)
        print("Words between M-Z are: ", MZ)

edit:编辑:

I was asked what the "matches" came from:有人问我“比赛”来自什么:

pattern = re.compile(r'\.?\s\b[A-Z][a-z]\w+')
matches = pattern.findall(text) 


# print(matches)

And here is the output list in which I am trying to sort the names between AL and MZ:这是 output 列表,我试图在其中对 AL 和 MZ 之间的名称进行排序:

matches = ['Rossini', 'William Tell', 'America', 'Athabaskan', 'Mackenzie River', 'Morse', 'Trappist', 'Plains', 'India']

And what I meant with "It does not work" is that it returns empty brackets.我所说的“它不起作用”的意思是它返回空括号。

Names between A and L:  []
Names between M and Z:  []

Thank you all for your contributions.谢谢大家的贡献。

So look at the following example:所以看下面的例子:

def filter_words(words):
    AL = []
    MZ = []
    unclassified = []
    for word in words:
        if ((word[0] >= "A") and (word[0] <= "L")):
            AL.append(word)
        elif ((word[0] >= "M" and word[0] <= "Z")):
            MZ.append(word)
        else:
            unclassified.append(word)
    return AL, MZ, unclassified
    

AL, MZ, unclassified = filter_words(["Al", "Bob", "Todd", "Zack", "todd", "zack"])

print(AL)
print(MZ)
print(unclassified)

OUTPUT OUTPUT

['Al', 'Bob']
['Todd', 'Zack']
['todd', 'zack']

Depending on your requirements, you may want to call word.upper() before processing the if statements given that - as you can see - if a name starts with a lower case will be unclassified根据您的要求,您可能希望在处理 if 语句之前调用word.upper()给定 - 如您所见 - 如果名称以小写开头将是未分类

I guess the main problem you may have is that you are checking for upper case values and if match[0] is in lower case it won't work我想您可能遇到的主要问题是您正在检查大写值,如果 match[0] 是小写,它将不起作用

Also you are printing at every iteration you should wait for the entire loop to run and then print.此外,您在每次迭代时都在打印,您应该等待整个循环运行然后打印。 Here这里

AL= []
MZ = []
for match in matches:
    if ((match[0].lower() >= "a") and (match[0].lower() <= "l")):
        AL.append(match)

    elif ((match[0].lower() >= "m" and match[0].lower() <= "z")):
        MZ.append(match)
        
print("Words between A-L are:  ", AL)
print("Words between M-Z are: ", MZ)

If this still doesn't work please share the match object too.如果这仍然不起作用,请也分享匹配 object。 Also your code doesn't account situation when neither of the statements are true.此外,当这两个语句都不正确时,您的代码也不会考虑情况。

Here you have to take care of the word case:在这里,您必须注意单词大小写:

matches = ['Arbre', 'Amie', 'Maison', 'Ligne', 'Zebre', 'Maths']

AL = []
MZ = []
for match in matches:
    match = match.upper()
    if ((match[0] >= "A") and (match[0] <= "L")):
        AL.append(match)
    elif ((match[0] >= "M" and match[0] <= "Z")):
        MZ.append(match)
        
print("Words between A-L are:  ", AL)
print("Words between M-Z are: ", MZ)

output is: output 是:

Words between A-L are:   ['ARBRE', 'AMIE', 'LIGNE']
Words between M-Z are:  ['MAISON', 'ZEBRE', 'MATHS']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM