简体繁体中英

Read file line-by-line or store in memory?

原文 2014-07-07 08:38:43 3 2 python/ file/ ram

This is less of a "my code's broken" question and more of a "should I do this?" question.

I have a script that iterates line-by-line using somethting like this: reader = csv.DictReader(open('file.txt', 'rb'), delimiter= '\\t') and gets things like ages and dates without committing the whole thing to memory.

As it stands, the script uses about 5% of my RAM (8GB).

In general, is it more accepted to put a file into memory instead of opening it and looping through its contents -- especially if it's large (over 700MB)?

My script is for personal use, but I'd rather learn Python's conventions and do what's considered acceptable. For example, I know that if I were doing something similar in JavaScript I'd try to conserve memory as much as possible to prevent browsers from crashing or becoming unresponsive.

Is a method (memory vs looping) preferred over another in Python?

edit: I'm aware this could be kind of broad. I'm more curious as to the best (Pythonic) practice.

There seems to be a lot of posts asking how to do it, but not a lot asking why or if .

2 answers

AFAIK, your method is the pythonic way to do this.

You should be aware of the fact that open('file.txt') does not put the whole file into memory. It returns an iterator which reads the file on demand. So does your DictReader .

Just try processing a large file, you won't see any increase in memory consumption.

Most of the time, it's better to process the file as you read it. The operating system expects such behaviour so it reads ahead a bit to compensate for the latency of the disk system. Loading the file in its entirety will normally reserve the memory used for only your process which is wasteful if you're only scanning through it once. You could mmap it, which lets the system use disk buffers directly, but that loses the hint of where you will be reading next. Reading too small chunks causes the system call overhead to dominate so you'll want to read fairly large chunks if possible, but for most programs the default buffering while reading lines is sufficient.

Read a file line-by-line instead of read file into memory

Read CSV file line-by-line python

How to read a file line-by-line into a list?

What is the least memory-intensive way to read a Parquet file in Python? Is line-by-line possible?

How should I read a file line-by-line in Python?

How to read a file line-by-line into a new list?

Read a text file line-by-line into a string with python

Reading a file line-by-line

Python line-by-line memory profiler?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Read a file line-by-line instead of read file into memory Read CSV file line-by-line python How to read a file line-by-line into a list? What is the least memory-intensive way to read a Parquet file in Python? Is line-by-line possible? How should I read a file line-by-line in Python? Read next and previous lines while reading file line-by-line How to read a file line-by-line into a new list? Read a text file line-by-line into a string with python Reading a file line-by-line Python line-by-line memory profiler?

Related Tags

Read file line-by-line or store in memory?

Question

2 answers

solution1
1 ACCPTED 2014-07-07 08:51:17

solution2
1 2014-07-07 09:03:06

Read file line-by-line or store in memory?

Question

2 answers

solution1 1 ACCPTED 2014-07-07 08:51:17

solution2 1 2014-07-07 09:03:06

solution1
1 ACCPTED 2014-07-07 08:51:17

solution2
1 2014-07-07 09:03:06