简体   繁体   中英

Clean python RAM memory

I have a few scripts, which download data from BigQuery, make it more easy to handle and transfer it to PostgreSQL. The problem is, that files from BigQuery are quite massive. Separated by a day, and each day has around 700-1500 Mb of data, which is handled with pandas dataframes. I tried to make a script, so that it handles each file one by one, but I run out of memory.

gc.collect() in the end of the loop doesn't help, using del(n, h, r) to delete each dataframe in each loop also doesn't work as needed. Still run out of RAM.

I tried to run the script in a loop, thought maybe that will help

for dir, subdir, files in os.walk(source_path):
    for file in files:
        exec(open("script.py").read())
        gc.collect()

In the end of the script I also have gc.collect() and del(). Still it can do 2, 3 files max and then it runs out of memory.

I tried to put sys.exit in the end of script.py, but in this case the abovementioned loop breaks after 1 file.

How can I avoid running out of memory? Basically, cleaning RAM of file from previous iteration of a loop and contionue to the next?

A better way of handling this is through pandas chunk handler.

 for chunk in pd.read_sql_query(sql , con, chunksize=10000):
      # upload chunk into PG, so you're not reading the entire table at once

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM