简体   繁体   English

通过python从大文件中读取零件文本作为页面

[英]read part text from a large file as a page by python

in python, file.readlines() get all line, it's waste if file size is large(several Mb). 在python中,file.readlines()获得所有行,如果文件大小很大(几个Mb),则很浪费。 is there a efficient way to get parts of files as page? 有没有一种有效的方法来获取部分文件作为页面? usually the part text display as a page in webapp, consider text are to be further decorated. 通常零件文本在webapp中显示为页面,请考虑对文本进行进一步修饰。

currently i though of a rough way by byte size: 目前我虽然按字节大小粗略地看:

import os
def getpage(fname, pageindex, pagesize=100, ahead=20):
    """read page roughly by byte size"""
    size = os.path.getsize(fname)
    pagenum = size/pagesize
    f=open(fname,'r')
    pos=pageindex * pagesize
    #ahead some line
    pos -= ahead
    if pos <0: 
        pos = 0
    f.seek(pos)
    f.readline()
    txt = f.read(pagesize)
    txt += f.readline()    
    return txt

it's not fixed lines, some text are loose, some tight. 它不是固定的线,有些文本是松散的,有些是紧的。 but for moderate pagesize, it's ok for user's view. 但对于适中的页面大小,用户可以使用。

You can do this in a very pythonic and efficent way using Generators: 您可以使用Generators以非常高效和高效的方式进行此操作:

def getPage(fileName, numberOfLinesInAPage):

    f = open(fileName)

    lines = (line.strip() for line in f)
    pageBuffer = []
    for lineNum, eachLine in enumerate(lines,1):
        pageBuffer.append(eachLine)
        if lineNum % numberOfLinesInAPage == 0:
            yield pageBuffer
            pageBuffer = []
    if pageBuffer:
        yield pageBuffer    
    f.close()   

for i in getPage('test.txt',100):
    print i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM