简体   繁体   English

Python:如何在不知道文件实际存在多长时间的情况下从文件中读取一大块文本?

[英]Python: How do you read a chunk of text from a file without knowing how long the file actually is?

What I'd like to do is basically I have this file with data in separate lines, except the last piece which is a biography and may stretch across many lines. 我想做的是基本上我有这个文件的数据是单独的行,除了最后一篇是传记,可能会延伸到很多行。 The biography may be any number of lines long, and all I know is that it starts on the 5th line. 传记可以是任意数量的行,我所知道的是它从第5行开始。 Now what I need is a way to retrieve the biography from the fifth line to the end of the file, but I don't know how to do this. 现在我需要的是一种从第五行到文件末尾检索传记的方法,但我不知道如何做到这一点。 Thanks in advance. 提前致谢。

Here's what I tried: 这是我试过的:

from tkinter import *
import os

class App:

    charprefix = "character_"
    charsuffix = ".iacharacter"
    chardir = "data/characters/"


    def __init__(self, master):
        self.master = master
        frame = Frame(master)
        frame.pack()

        # character box
        Label(frame, text = "Characters Editor").grid(row = 0, column = 0, rowspan = 1, columnspan = 2)
        self.charbox = Listbox(frame)
        for chars in []:
            self.charbox.insert(END, chars)
        self.charbox.grid(row = 1, column = 0, rowspan = 5)
        charadd = Button(frame, text = "   Add   ", command = self.addchar).grid(row = 1, column = 1)
        charremove = Button(frame, text = "Remove", command = self.removechar).grid(row = 2, column = 1)
        charedit = Button(frame, text = "    Edit    ", command = self.editchar).grid(row = 3, column = 1)

        for index in self.charbox.curselection():
            charfilelocale = self.charbox.get(int(index))
            charfile = open(app.chardir + app.charprefix + app.charfilelocale, 'r+')
            charinfo = str.splitlines(0)

If you just want to put the entire biography in a string, you can do this: 如果您只想将整个传记放在一个字符串中,您可以这样做:

with open('biography.txt') as f:
    for i in range(4): # Read the first four lines
        f.readline()
    s = ''
    for line in f:
        s += line

" for line in f " iterates over f . for line in f ”迭代f iter(f) returns a generator function that yields f.readline() until the end of the file is reached. iter(f)返回一个生成函数,生成f.readline()直到达到文件末尾。

Another way to phrase your question would be "how do I discard the first four lines of a file I read?" 另一种表达你的问题的方法是“如何丢弃我读过的文件的前四行?” Taking the answer to that a step at a time: 一步一步地回答这个问题:

filename = "/a/text/file"
input_file = open(filename)

where the default mode for open() is 'r' so you don't have to specify it. 其中open()的默认模式是'r'因此您不必指定它。

contents = input_file.readlines()
input_file.close()

where readlines() returns a list of all the lines contained in the input file in one gulp. 其中readlines()返回一个gulp中输入文件中包含的所有行的列表。 You were going to have to read it all anyway, so let's do it with one method call. 你无论如何都要读它,所以让我们用一个方法调用来做。 And, of course close() because you are a tidy coder. 而且,当然是close()因为你是一个整洁的程序员。 Now you can use list slicing to get the part that you want: 现在,您可以使用列表切片来获取所需的部件:

biography = contents[4:]

which didn't actually throw away the first four lines, it just assigned all but the first four to biography. 实际上并没有丢掉前四行,它只是将前四个分配给了传记。 To make this a little more idiomatic gives: 为了使这更加惯用,给出了:

with open(filename) as input_file:
    biography = input_file.readlines()[4:]

The with context manager is useful to know but look it up when you are ready. with上下文管理器很有用,但在准备好后查找它。 Here it saved you the close() but it is a little more powerful than just that. 在这里它保存了close()但它比那更强大。

added in response to comment : 添加以回应评论

Something like 就像是

with open(filename) as input_file:
    contents = input_file.readlines()
person = contents[0]
birth_year = contents[1]
...
biography = contents[4:]

but I think you figured that bit out while I was typing it. 但我觉得你在打字的时候觉得有点不对劲。

f = open('workfile', 'w') f = open('workfile','w')

for line in f: print line, for f in line:print line,

This is the first line of the file.
Second line of the file

Python does not require you to know in advance how big a file is or how many lines it contains. Python不要求您事先知道文件有多大或包含多少行。 It uses an iterator and gets the lines from the file and returns lines lazily. 它使用迭代器并从文件中获取行并延迟返回行。 find some excellent documentation here: http://docs.python.org/2/tutorial/inputoutput.html 在这里找到一些优秀的文档: http//docs.python.org/2/tutorial/inputoutput.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python (200 GB+) 从长 csv 文件的中间读取块 - How to read chunk from middle of a long csv file using Python (200 GB+) 你如何在Python中阅读文本文件的特定行? - How do you read a specific line of a text file in Python? 如何在Python中读取与文件中的起始模式匹配的一大块行? - How to read a chunk of lines that match a starting pattern from a file in Python? 您如何从文件中读取文本然后按降序对其进行排序? - How do you read text from a file then sort it in decending order? 如何阅读文本文件,然后使用python将其拆分为多个文本文件? - How do you read a text file, then split that text file into multiple text files with python? 如何修改和运行 my.py 文件中的一大段代码,而不必每次都读取整个.xlsx 文件? - How do I modify and run a chunk of code from my .py file without having to read the whole .xlsx file every time? 如何读取长文本文件并打印文本文件的一部分 - How to read long text file and print a part from text file Python如何在不使用转义字符的情况下打印从文件读取的文本? - Python How to print the text read from a file without the escape char? 在不知道编码的情况下使用 Python 读取文件 - Read file with Python without knowing encoding 在 python 中读取不知道文件名的压缩文件 - read a zipfile without knowing file name in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM