Pandas has the excellent .read_table()
function, but huge files result in a MemoryError.
Since I only need to load the lines that satisfy a certain condition, I'm looking for a way to only load those.
This could be done using a temporary file:
with open(hugeTdaFile) as huge:
with open(hugeTdaFile + ".partial.tmp", "w") as tmp:
tmp.write(huge.readline()) # the header line
for line in huge:
if SomeCondition(line):
tmp.write(line)
t = pandas.read_table(tmp.name)
Is there a way to avoid such a use of a temp file?
you can use the chunksize parameter to return an iterator
see this: http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk
(alternatively you could write them out to new csvs or HDFStores or whatever)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.