[英]How to extract words starting with letters from A to L from a text in Python?
语言:Python
因此,我在不使用外部库的情况下从文本中提取命名实体。 然后我想创建两个列表,一个用于以 AL 开头的名称; 另一个以 MZ 开头的名称。 我写了这段代码,但它没有返回任何 output。有人看到这里出了什么问题吗? 请注意,我是一般编码的初学者,可能不熟悉这些术语。
AL = []
MZ = []
for match in matches:
if ((match[0] >= "A") and (match[0] <= "L")):
AL.append(match)
print("Words between A-L are: ", AL)
elif ((match[0] >= "M" and match[0] <= "Z")):
MZ.append(match)
print("Words between M-Z are: ", MZ)
编辑:
有人问我“比赛”来自什么:
pattern = re.compile(r'\.?\s\b[A-Z][a-z]\w+')
matches = pattern.findall(text)
# print(matches)
这是 output 列表,我试图在其中对 AL 和 MZ 之间的名称进行排序:
matches = ['Rossini', 'William Tell', 'America', 'Athabaskan', 'Mackenzie River', 'Morse', 'Trappist', 'Plains', 'India']
我所说的“它不起作用”的意思是它返回空括号。
Names between A and L: []
Names between M and Z: []
谢谢大家的贡献。
所以看下面的例子:
def filter_words(words):
AL = []
MZ = []
unclassified = []
for word in words:
if ((word[0] >= "A") and (word[0] <= "L")):
AL.append(word)
elif ((word[0] >= "M" and word[0] <= "Z")):
MZ.append(word)
else:
unclassified.append(word)
return AL, MZ, unclassified
AL, MZ, unclassified = filter_words(["Al", "Bob", "Todd", "Zack", "todd", "zack"])
print(AL)
print(MZ)
print(unclassified)
OUTPUT
['Al', 'Bob']
['Todd', 'Zack']
['todd', 'zack']
根据您的要求,您可能希望在处理 if 语句之前调用word.upper()
给定 - 如您所见 - 如果名称以小写开头将是未分类的
我想您可能遇到的主要问题是您正在检查大写值,如果 match[0] 是小写,它将不起作用
此外,您在每次迭代时都在打印,您应该等待整个循环运行然后打印。 这里
AL= []
MZ = []
for match in matches:
if ((match[0].lower() >= "a") and (match[0].lower() <= "l")):
AL.append(match)
elif ((match[0].lower() >= "m" and match[0].lower() <= "z")):
MZ.append(match)
print("Words between A-L are: ", AL)
print("Words between M-Z are: ", MZ)
如果这仍然不起作用,请也分享匹配 object。 此外,当这两个语句都不正确时,您的代码也不会考虑情况。
在这里,您必须注意单词大小写:
matches = ['Arbre', 'Amie', 'Maison', 'Ligne', 'Zebre', 'Maths']
AL = []
MZ = []
for match in matches:
match = match.upper()
if ((match[0] >= "A") and (match[0] <= "L")):
AL.append(match)
elif ((match[0] >= "M" and match[0] <= "Z")):
MZ.append(match)
print("Words between A-L are: ", AL)
print("Words between M-Z are: ", MZ)
output 是:
Words between A-L are: ['ARBRE', 'AMIE', 'LIGNE']
Words between M-Z are: ['MAISON', 'ZEBRE', 'MATHS']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.