简体   繁体   中英

Indexing and search in a text file

I have a text file that contains the contents of a book. I want to take this file and build an index which allows the user to search through the file to make searches.

The search would consist of entering a word. Then, the program would return the following:

  • Every chapter which includes that word.
  • The line number of the line which contains the word.
  • The entire line the word is on.

I tried the following code:

infile =   open(file)

Dict = {}

word = input("Enter a word to search: ")

linenum = 0
line = infile.readline()
for line in infile
    linenum += 1
    for word in wordList:
        if word in line:
            Dict[word] = Dict.setdefault(word, []) + [linenum]
            print(count, word)
    line = infile.readline()

return Dict

Something like this does not work and seems too awkward for handling the other modules which would require:

  • An "or" operator to search for one word or another
  • An "and" operator to search for one word and another in the same chapter

Any suggestions would be great.

def classify_lines_on_chapter(book_contents):
    lines_vs_chapter = []
    for line in book_contents:
        if line.isupper():
            current_chapter = line.strip()
        lines_vs_chapter.append(current_chapter)
    return lines_vs_chapter


def classify_words_on_lines(book_contents):
    words_vs_lines = {}
    for i, line in enumerate(book_contents):
        for word in set([word.strip(string.punctuation) for word in line.split()]):
            if word:
                words_vs_lines.setdefault(word, []).append(i)
    return words_vs_lines


def main():
    skip_lines = 93

    with open('book.txt') as book:
        book_contents = book.readlines()[skip_lines:]

    lines_vs_chapter = classify_lines_on_chapter(book_contents)
    words_vs_lines = classify_words_on_lines(book_contents)

    while True:
        word = input("Enter word to search - ")
        # Enter a blank input to exit
        if not word:
            break

        line_numbers = words_vs_lines.get(word, None)
        if not line_numbers:
            print("Word not found!!\n")
            continue

        for line_number in line_numbers:
            line = book_contents[line_number]
            chapter = lines_vs_chapter[line_number]
            print("Line " + str(line_number + 1 + skip_lines))
            print("Chapter '" + str(chapter) + "'")
            print(line)


if __name__ == '__main__':
    main()

Try it on this input file . Rename it as book.txt before running it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM