简体   繁体   English

如何在python文件中获取字节偏移

[英]How to get byte offset in a file in python

I am making a inverted index using hadoop and python. 我正在使用hadoop和python进行反向索引。 I want to know how can I include the byte offset of a line/word in python. 我想知道如何在python中包含行/字的字节偏移量。 I need something like this 我需要这样的东西

hello hello.txt@1124

I need the locations for making a full inverted index. 我需要用于制作完整倒排索引的位置。 Please help. 请帮忙。

Like this? 像这样?

file.tell()

Return the file's current position, like stdio's ftell(). 返回文件的当前位置,例如stdio的ftell()。

http://docs.python.org/library/stdtypes.html#file-objects http://docs.python.org/library/stdtypes.html#file-objects

Unfortunately tell() does not function since OP is using stdin instead of a file. 不幸的是,tell()无法运行,因为OP使用的是stdin而不是文件。 But it is not hard to build a wrapper around it to give what you need. 但是围绕它构建包装以提供所需的东西并不难。

class file_with_pos(object):
    def __init__(self, fp):
        self.fp = fp
        self.pos = 0
    def read(self, *args):
        data = self.fp.read(*args)
        self.pos += len(data)
        return data
    def tell(self):
        return self.pos

Then you can use this instead: 然后,您可以使用它代替:

fp = file_with_pos(sys.stdin)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从指定字节偏移量的文件中获取行 - Get line from file at specified byte offset 如何在Python中的STDIN中找到字节偏移? - How to find the byte offset in STDIN in Python? 从Python UnicodeDecodeError异常获取错误的字节偏移量 - Get byte offset of error from Python UnicodeDecodeError exception Python - 如何正确获取文件中两个偏移量之间的内容? - Python - How to correctly get content between two offset in a file? Python - 如何逐字节编辑十六进制文件 - Python - How to edit hexadecimal file byte by byte 如何在Python中获取任何时区的当前偏移量 - How to get current offset of any timezone in Python 在Python中是否有一个快速的XML解析器允许我在流中将标记的开头作为字节偏移量? - Is there a fast XML parser in Python that allows me to get start of tag as byte offset in stream? 如何在 Python 中逐字节读取文件以及如何将字节列表打印为二进制文件? - How to read a file byte by byte in Python and how to print a bytelist as a binary? 使用 numpy 读取二进制数据文件时的字节偏移 - byte offset in reading binary data file with numpy 如何通过知道 python 中单词的偏移量从文本文件中获取原始句子? - How to get the original sentence from a text file by knowing an offset of a word in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM