读出基于文本文件的字典的定义

Question

I'm trying to write a Python function that takes as an input a text file based dictionary, for example Webster's free dictionary. 我正在尝试编写一个Python函数，该函数将基于文本文件的字典作为输入，例如Webster的免费字典。 The function "webster_definition" will then search through the text file and print the definition for a specific word, eg "Canada". 然后，功能“ webster_definition”将搜索文本文件并为特定单词（例如“ Canada”）打印定义。

Here is what I've got so far: 这是到目前为止我得到的：

import re
import sys

def webster_definition(word):
    word = word.upper()
    output = ""
    doc = open("webster.txt",'r')           

    for line in doc:     
        if re.match(word,line):
            print line

    return output

print webster_definition("Canada")

This will output the word I looked for. 这将输出我寻找的单词。 But the definition starts three lines later with "Defn:" and is of variable length, eg: 但是，该定义随后以“ Defn：”开始三行，并且长度可变，例如：

CANADA
Can"a*da, n.

Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
 -- Canada goose. (Zoöl.) See Whisky Jack.
 -- Canada lynx. (Zoöl.) See Lynx.
 -- Canada porcupine (Zoöl.) See Porcupine, and Urson.
 -- Canada rice (Bot.) See under Rick.
 -- Canada robin (Zoöl.), the cedar bird.

The desired output should look like: 所需的输出应如下所示：

CANADA
Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
 -- Canada goose. (Zoöl.) See Whisky Jack.
 -- Canada lynx. (Zoöl.) See Lynx.
 -- Canada porcupine (Zoöl.) See Porcupine, and Urson.
 -- Canada rice (Bot.) See under Rick.
 -- Canada robin (Zoöl.), the cedar bird.

Can anyone help me with the output of the definition? 谁能帮我定义的输出？

Answer 1

in file: 在文件中：

CANADA
Can"a*da, n.

Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
 -- Canada goose. (Zoöl.) See Whisky Jack.
 -- Canada lynx. (Zoöl.) See Lynx.
 -- Canada porcupine (Zoöl.) See Porcupine, and Urson.
 -- Canada rice (Bot.) See under Rick.
 -- Canada robin (Zoöl.), the cedar bird.

 ANOTHER DEFENITION
 another defenition

 Defn.. some words
 more words
 ......


with open('webster_file', 'r') as f:
     # read into a string.
     data = f.read()

# uppercase word to search for
word = 'canada'.upper()
# search for empty line and the get everything non-greedy up to the 
# another empty line.
pattern = '^' + word + '.*?\n^$\n.*?^$'

mo = re.search(pattern,data,re.M|re.DOTALL)

if mo:
    print(mo.group(0))


CANADA
Can"a*da, n.

Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
 -- Canada goose. (Zoöl.) See Whisky Jack.
 -- Canada lynx. (Zoöl.) See Lynx.
 -- Canada porcupine (Zoöl.) See Porcupine, and Urson.
 -- Canada rice (Bot.) See under Rick.
 -- Canada robin (Zoöl.), the cedar bird

Answer 2

I am not sure i completely follow. 我不确定我是否会完全遵循。

But if you do not want to print empty lines you can check for it and not print them, eg: 但是，如果您不想打印空行，则可以检查它而不打印它们，例如：

if line in ['\n', '\r\n', '']:
    continue
print line

Answer 3

# coding: utf-8

data = '''CANADA
Can"a*da, n.

Defn: A British province in North America, giving its name to various
plants and animals. Canada balsam. See under Balsam.
 -- Canada goose. (Zoöl.) See Whisky Jack.
 -- Canada lynx. (Zoöl.) See Lynx.
 -- Canada porcupine (Zoöl.) See Porcupine, and Urson.
 -- Canada rice (Bot.) See under Rick.
 -- Canada robin (Zoöl.), the cedar bird.'''

data = data.split('\n\n')
data = [data[0].split('\n')[0]] + [data[1]]
data = '\n'.join(data)

Answer 4

It usually helps to decompose the task into simpler functions. 通常有助于将任务分解为更简单的功能。 For the case of text files parsing, generators are very handy: 对于文本文件解析，生成器非常方便：

def read_paragraph(fp):
    """ 
    Read lines from the file until the end of the paragraph 
    """
    while True:
        line = fp.readline()
        if not line.strip():
            break
        yield line

def skip_paragraph(fp):
    while fp.readline().strip():
        pass

def find_definition(fp, word):
    word = word.upper()
    while True:
        line = fp.readline()
        if not line:
            break  # end of file, consider raising an exception here

        if line.strip() == word:
            yield line          # found the word
            skip_paragraph(fp)  # move to the definition
            for defline in read_paragraph(fp):
                yield defline
            break
        else:
            # we're not interested in the current word, just skipping it
            skip_paragraph(fp)
            skip_paragraph(fp)

Finally, to get a definition: 最后，获得一个定义：

with open('webster.txt') as fp:
    definition = find_definition(fp, 'Canada')
    print ''.join(definition)

That said, if you need to query the dictionary often, and performance is of any concern, consider converting the text file into an sqlite database. 就是说，如果您需要经常查询字典，并且对性能有任何疑问，请考虑将文本文件转换为sqlite数据库。

读出基于文本文件的字典的定义

问题描述

4 个解决方案

解决方案1
0 2015-10-24 10:12:32

解决方案2
0 2015-10-24 10:14:00

解决方案3
0 2015-10-24 10:19:28

解决方案4
0 已采纳 2015-10-24 11:01:42

读出基于文本文件的字典的定义

问题描述

4 个解决方案

解决方案1 0 2015-10-24 10:12:32

解决方案2 0 2015-10-24 10:14:00

解决方案3 0 2015-10-24 10:19:28

解决方案4 0 已采纳 2015-10-24 11:01:42

解决方案1
0 2015-10-24 10:12:32

解决方案2
0 2015-10-24 10:14:00

解决方案3
0 2015-10-24 10:19:28

解决方案4
0 已采纳 2015-10-24 11:01:42