简体   繁体   中英

How to download large data from a SQL table and consecutively save into csv by fetching 1000 or so records at once

I have a SQL table consisting of 10 million rows and lot of columns, The table size when queried is around 44 GB.

However I am trying to fetch only 3 columns from this table and save it into csv / load into dataframe the python keeps running forever. ie

 pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data
How to achieve this? Can I load this entire data in dataframe at once is that a viable option.
After this I should be able to perform some data manipulations on these rows.

2. OR should I download this to csv and read this data part by part to in-memory.

If its 2. How to code for 2?

Code tried for 2 so far is :

   def iter_row(cursor, size=10):
while True:
    rows = cursor.fetchmany(size)
    if not rows:
        break
    for row in rows:
        yield row

  def query_with_fetchmany():

    cursor.execute("SELECT * FROM books")

    for row in iter_row(cursor, 10):
        print(row)
    cursor.close()

you can read data in chunks:

for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
    c.to_csv(r'/path/to/file.csv', index=False, mode='a')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM