简体   繁体   中英

How to get the nth character in extremely large text file?

I have a very large text file (~40GB) containing unseparated digits. It's been a while since I've dealt with file I/O in python (or python more generally), and I remember some wizardry with generators being used to access such files. Google yielded little specific help; it seems like everyone deals with sensibly-formatted data they can access line-by-line. All I need to do is read the nth character without destroying the kernel by reading too much into RAM. Any ideas?

You can use f.seek to get the nth byte in the file. In most common encodings, it's also the nth character:

with open("file.txt") as f:
    char = f.seek(N - 1)

Use seek which will move reading file to given position. Then call read .

Additionally, if you don't want indeed any extra data being loaded to memory during read (just one byte/char) use also buffering=0 when opening a file.

with open("largeFile", buffering=0) as f:
    f.seek(10000)
    char = f.read(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM