简体   繁体   中英

How to get byte offset in a file in python

I am making a inverted index using hadoop and python. I want to know how can I include the byte offset of a line/word in python. I need something like this

hello hello.txt@1124

I need the locations for making a full inverted index. Please help.

Like this?

file.tell()

Return the file's current position, like stdio's ftell().

http://docs.python.org/library/stdtypes.html#file-objects

Unfortunately tell() does not function since OP is using stdin instead of a file. But it is not hard to build a wrapper around it to give what you need.

class file_with_pos(object):
    def __init__(self, fp):
        self.fp = fp
        self.pos = 0
    def read(self, *args):
        data = self.fp.read(*args)
        self.pos += len(data)
        return data
    def tell(self):
        return self.pos

Then you can use this instead:

fp = file_with_pos(sys.stdin)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM