簡體   English   中英

讀入文件,並在Python中跳過文本文件的標題部分

[英]Read in a file and skip the header portion of a text file in Python

我從gutenberg.org上取了一本文本格式的書,試圖讀取文本,但是跳過了文件的開頭部分,然后使用我編寫的處理函數來解析其余部分。 我怎樣才能做到這一點?

這是文本文件的開始。

> The Project Gutenberg EBook of The Kama Sutra of Vatsyayana, by Vatsyayana

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: The Kama Sutra of Vatsyayana
       Translated From The Sanscrit In Seven Parts With Preface,
       Introduction and Concluding Remarks

Author: Vatsyayana

Translator: Richard Burton
            Bhagavanlal Indrajit
            Shivaram Parashuram Bhide

Release Date: January 18, 2009 [EBook #27827]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK THE KAMA SUTRA OF VATSYAYANA ***




Produced by Bruce Albrecht, Carla Foust, Jon Noring and
the Online Distributed Proofreading Team at
http://www.pgdp.net

和我當前處理整個文件的代碼。

import string

def process_file(filename):
    """ opens a file and passes back a list of its words"""
    h = dict()
    fin = open(filename)
    for line in fin:
        process_line(line, h)
    return h

def process_line(line, h):
    line = line.replace('-', ' ')

    for word in line.split():
        word = word.strip(string.punctuation + string.whitespace)
        word = word.lower()

        h[word] = h.get(word,0)+1

添加:

for line in fin:
   if "START OF THIS PROJECT GUTENBERG BOOK" in line:
       break

就在您自己的“ for fin:行”循環之前。

好吧,您只要閱讀輸入內容,直到符合條件即可跳過開頭:

def process_file(filename):
    """ opens a file and passes back a list of its words"""
    h = dict()
    fin = open(filename)

    for line in fin:
        if line.rstrip() == "*** START OF THIS PROJECT GUTENBERG EBOOK THE KAMA SUTRA OF VATSYAYANA ***":
            break

    for line in fin:
        process_line(line, h)

    return h

請注意,在本示例中,我使用line.rstrip() == "*** START OF THIS PROJECT GUTENBERG EBOOK THE KAMA SUTRA OF VATSYAYANA ***"作為標准,但是您可以完全自己設置。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM