简体   繁体   中英

Find the last occurrence of a word in a large file with python

I have a very large text file. I want to search for the last occurrence of a specific word and then perform certain operations on the lines that follows it.

I can do something like:

if "word" in line.split():
    do something

I am only interested in the last occurrence of "word" however.

Well an easier and quicker solution would be to open the file in reversed order and then searching the first word location.

In python 2.6 you can do something like (where word is string you are looking for)

for line in reversed(open("filename").readlines()):
    if word in line:
    # Do the operations here when you find the line

Try like this:

f = open('file.txt', 'r')
lines = f.read()
answer = lines.find('word')

and then you can pick the last word from this

You may also use str.rfind

str.rfind(sub[, start[, end]])

Return the highest index in the string where substring sub is found, such that sub is contained within s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure.

You can open your file, transform it into a list, reverse its order and iterate looking for your word.

with open('file.txt','r') as file_:
    line_list = list(file_)
    line_list.reverse()

    for line in line_list:
        if line.find('word') != -1:
            # do something
            print line

Optionally you can specify the size of the file buffer passing the buffer size (in bytes) as the third parameter of open . For instance: with open('file.txt','r', 1024) as file_:

If the file is hundreds of megabytes or even gigabytes in size, then you may want to use mmap so you don't have to read the entire file into memory. The rfind method finds the last occurrence of a string in the file.

import mmap

with open('large_file.txt', 'r') as f:
    # memory-map the file, size 0 means whole file
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)  
                          # prot argument is *nix only

    i = m.rfind('word')   # search for last occurrence of 'word'
    m.seek(i)             # seek to the location
    line = m.readline()   # read to the end of the line
    print line
    nextline = m.readline()

Just keep calling readline() to read following lines.

If the file is extremely large (like tens of gigabytes) then you can map it in chunks with the length and offset arguments of mmap()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM