简体   繁体   中英

py2neo cursor appears to consume everything into memory rather than stream data

I am running a query to a Neo4J server, which I expect to return >100M rows (but just a few columns) and then write the results into a CSV file. This works well for queries that return up to 10-20M rows but becomes tricky as the resultant rows go up into 10^8 numbers.

I thought, writing the results row by row (ideally buffered) should be a solution but the csv.Writer appears to only write into disk once the whole code executes (ie at the end of the iteration), rather than in chunks as expected. In this example below, I tried explicitly flushing the file (which did not work). I also do not get any output on stdout indicating that the iteration is not occurring as intended.

The mem usage of the process is growing rapidly however, over 12GBs last I checked. That makes me think that the cursor is trying to get all the data before starting iteration, which it should not do, unless I misunderstood something.

Any ideas?

from py2neo import Graph
import csv

cursor = g.run(query)
with open('bigfile.csv', 'w') as csvfile:
    fieldnames = cursor.keys()
    writer = csv.Writer(csvfile)

#    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#    writer.writeheader()
    i = 0
    j = 1
    for rec in cursor:
#        writer.writerow(dict(rec))
        writer.writerow(rec.values())
        i +=1 
        if i == 50000:
            print(str(i*j) + '...')
            csvfile.flush()
            i = 0
            j +=1

Isn't the main problem the size of the query, rather than the method of writing the results to the CSV file? If you're chunking the writing process, perhaps you should chunk the querying process aswell, since the results are stored in memory while the file writing is taking place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM