[英]Read out definitions of a text file based dictionary
I'm trying to write a Python function that takes as an input a text file based dictionary, for example Webster's free dictionary. 我正在尝试编写一个Python函数,该函数将基于文本文件的字典作为输入,例如Webster的免费字典。 The function "webster_definition" will then search through the text file and print the definition for a specific word, eg "Canada". 然后,功能“ webster_definition”将搜索文本文件并为特定单词(例如“ Canada”)打印定义。
Here is what I've got so far: 这是到目前为止我得到的:
import re
import sys
def webster_definition(word):
word = word.upper()
output = ""
doc = open("webster.txt",'r')
for line in doc:
if re.match(word,line):
print line
return output
print webster_definition("Canada")
This will output the word I looked for. 这将输出我寻找的单词。 But the definition starts three lines later with "Defn:" and is of variable length, eg: 但是,该定义随后以“ Defn:”开始三行,并且长度可变,例如:
CANADA
Can"a*da, n.
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
-- Canada goose. (Zoöl.) See Whisky Jack.
-- Canada lynx. (Zoöl.) See Lynx.
-- Canada porcupine (Zoöl.) See Porcupine, and Urson.
-- Canada rice (Bot.) See under Rick.
-- Canada robin (Zoöl.), the cedar bird.
The desired output should look like: 所需的输出应如下所示:
CANADA
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
-- Canada goose. (Zoöl.) See Whisky Jack.
-- Canada lynx. (Zoöl.) See Lynx.
-- Canada porcupine (Zoöl.) See Porcupine, and Urson.
-- Canada rice (Bot.) See under Rick.
-- Canada robin (Zoöl.), the cedar bird.
Can anyone help me with the output of the definition? 谁能帮我定义的输出?
in file: 在文件中:
CANADA
Can"a*da, n.
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
-- Canada goose. (Zoöl.) See Whisky Jack.
-- Canada lynx. (Zoöl.) See Lynx.
-- Canada porcupine (Zoöl.) See Porcupine, and Urson.
-- Canada rice (Bot.) See under Rick.
-- Canada robin (Zoöl.), the cedar bird.
ANOTHER DEFENITION
another defenition
Defn.. some words
more words
......
with open('webster_file', 'r') as f:
# read into a string.
data = f.read()
# uppercase word to search for
word = 'canada'.upper()
# search for empty line and the get everything non-greedy up to the
# another empty line.
pattern = '^' + word + '.*?\n^$\n.*?^$'
mo = re.search(pattern,data,re.M|re.DOTALL)
if mo:
print(mo.group(0))
CANADA
Can"a*da, n.
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
-- Canada goose. (Zoöl.) See Whisky Jack.
-- Canada lynx. (Zoöl.) See Lynx.
-- Canada porcupine (Zoöl.) See Porcupine, and Urson.
-- Canada rice (Bot.) See under Rick.
-- Canada robin (Zoöl.), the cedar bird
I am not sure i completely follow. 我不确定我是否会完全遵循。
But if you do not want to print empty lines you can check for it and not print them, eg: 但是,如果您不想打印空行,则可以检查它而不打印它们,例如:
if line in ['\n', '\r\n', '']:
continue
print line
# coding: utf-8
data = '''CANADA
Can"a*da, n.
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
-- Canada goose. (Zoöl.) See Whisky Jack.
-- Canada lynx. (Zoöl.) See Lynx.
-- Canada porcupine (Zoöl.) See Porcupine, and Urson.
-- Canada rice (Bot.) See under Rick.
-- Canada robin (Zoöl.), the cedar bird.'''
data = data.split('\n\n')
data = [data[0].split('\n')[0]] + [data[1]]
data = '\n'.join(data)
It usually helps to decompose the task into simpler functions. 通常有助于将任务分解为更简单的功能。 For the case of text files parsing, generators are very handy: 对于文本文件解析,生成器非常方便:
def read_paragraph(fp):
"""
Read lines from the file until the end of the paragraph
"""
while True:
line = fp.readline()
if not line.strip():
break
yield line
def skip_paragraph(fp):
while fp.readline().strip():
pass
def find_definition(fp, word):
word = word.upper()
while True:
line = fp.readline()
if not line:
break # end of file, consider raising an exception here
if line.strip() == word:
yield line # found the word
skip_paragraph(fp) # move to the definition
for defline in read_paragraph(fp):
yield defline
break
else:
# we're not interested in the current word, just skipping it
skip_paragraph(fp)
skip_paragraph(fp)
Finally, to get a definition: 最后,获得一个定义:
with open('webster.txt') as fp:
definition = find_definition(fp, 'Canada')
print ''.join(definition)
That said, if you need to query the dictionary often, and performance is of any concern, consider converting the text file into an sqlite database. 就是说,如果您需要经常查询字典,并且对性能有任何疑问,请考虑将文本文件转换为sqlite数据库。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.