简体   繁体   中英

Read file line-by-line or store in memory?

This is less of a "my code's broken" question and more of a "should I do this?" question.

I have a script that iterates line-by-line using somethting like this: reader = csv.DictReader(open('file.txt', 'rb'), delimiter= '\\t') and gets things like ages and dates without committing the whole thing to memory.

As it stands, the script uses about 5% of my RAM (8GB).

In general, is it more accepted to put a file into memory instead of opening it and looping through its contents -- especially if it's large (over 700MB)?

My script is for personal use, but I'd rather learn Python's conventions and do what's considered acceptable. For example, I know that if I were doing something similar in JavaScript I'd try to conserve memory as much as possible to prevent browsers from crashing or becoming unresponsive.

Is a method (memory vs looping) preferred over another in Python?

edit: I'm aware this could be kind of broad. I'm more curious as to the best (Pythonic) practice.

There seems to be a lot of posts asking how to do it, but not a lot asking why or if .

AFAIK, your method is the pythonic way to do this.

You should be aware of the fact that open('file.txt') does not put the whole file into memory. It returns an iterator which reads the file on demand. So does your DictReader .

Just try processing a large file, you won't see any increase in memory consumption.

Most of the time, it's better to process the file as you read it. The operating system expects such behaviour so it reads ahead a bit to compensate for the latency of the disk system. Loading the file in its entirety will normally reserve the memory used for only your process which is wasteful if you're only scanning through it once. You could mmap it, which lets the system use disk buffers directly, but that loses the hint of where you will be reading next. Reading too small chunks causes the system call overhead to dominate so you'll want to read fairly large chunks if possible, but for most programs the default buffering while reading lines is sufficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM