簡體   English   中英

如何從文本文件中提取某些段落

[英]How to extract certain paragraph from text file

    def extract_book_info(self):
        books_info = []
        for file in os.listdir(self.book_folder_path):
            title = "None"
            author = "None"
            release_date = "None"
            last_update_date = "None"
            language = "None"
            producer = "None"

            with open(self.book_folder_path + file, 'r', encoding = 'utf-8') as content:
                book_info = content.readlines()
                for lines in book_info:
                    if lines.startswith('Title'):
                        title = lines.strip().split(': ')
                       
                    elif lines.startswith('Author'):
                        try:
                            author = lines.strip().split(': ')
                           
                        except IndexError:
                            author = 'Empty'
                    elif lines.startswith('Release date'):
                        release_date = lines.strip().split(': ')
                         
                    elif lines.startswith('Last updated'):
                        last_update_date = lines.strip().split(': ')
                       
                    elif lines.startswith('Produce by'):
                        producer = lines.strip().split(': ')
                       
                    elif lines.startswith('Language'):
                        language = lines.strip().split(': ')
                        
                    elif lines.startswith('***'):
                        pass
                        

                books_info.append(Book(title, author, release_date, last_update_date, producer, language, self.book_folder_path))

        with open(self.book_info_path, 'w', encoding="utf-8") as book_file:
            for book_info in books_info:
                book_file.write(book_info.__str__() + "\n")

我正在使用這段代碼試圖提取書名、作者、release_date、last_update_date、語言、制作人、book_path)。

這是我實現的output:

['Title', 'The Adventures of Sherlock Holmes'];;;['Author', 'Arthur Conan Doyle'];;;None;;;None;;;None;;;['Language', 'English'];;;data/books_data/;;;

這是我應該實現的 output。

請問我應該使用什么方法來實現以下output

The Adventures of Sherlock Holmes;;;Arthur Conan Doyle;;;November29,2002;;;May20,2019;;;English;;;

這是輸入的示例:

Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle

Release Date: November 29, 2002 [eBook #1661]
[Most recently updated: May 20, 2019]

Language: English

Character set encoding: UTF-8

Produced by: an anonymous Project Gutenberg volunteer and Jose Menendez

*** START OF THE PROJECT GUTENBERG EBOOK THE ADVENTURES OF SHERLOCK HOLMES ***

cover

str.split結果給你一個列表。 您正在使用它來分配給單個值。

'Title: Sherlock Holmes'.split(':')  # => ['Title', 'Sherlock Holmes']

我可以從您的要求中收集到您希望每次都訪問split中的第二個元素。 你可以這樣做:

...
for lines in book_info:
    if lines.startswith('Author'):
        _, author = lines.strip().split(':')
    
    elif...

請小心,因為如果split結果中沒有第二個元素,這可能會引發IndexError (這就是為什么在你的代碼中try作者參數)

另外,避免直接調用__str__ 無論如何,這就是str() function 對您的要求。 改用那個。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM