如何从SQL表下载大数据并通过一次获取1000个左右的记录连续保存到csv中

Question

我有一个由1000万行和很多列组成的SQL表，查询时表的大小约为44 GB。

但是我试图从此表中仅获取3列并将其保存到csv中/加载到dataframe中，从而使python永远运行。 即

 pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data

How to achieve this? Can I load this entire data in dataframe at once is that a viable option.
After this I should be able to perform some data manipulations on these rows.

2. OR should I download this to csv and read this data part by part to in-memory.

如果是2，该如何编码2？

到目前为止尝试2的代码是：

   def iter_row(cursor, size=10):
while True:
    rows = cursor.fetchmany(size)
    if not rows:
        break
    for row in rows:
        yield row

  def query_with_fetchmany():

    cursor.execute("SELECT * FROM books")

    for row in iter_row(cursor, 10):
        print(row)
    cursor.close()

Answer 1

您可以分块读取数据：

for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
    c.to_csv(r'/path/to/file.csv', index=False, mode='a')

如何从SQL表下载大数据并通过一次获取1000个左右的记录连续保存到csv中

问题描述

1 个解决方案

解决方案1
2 2017-05-26 14:18:26

如何从SQL表下载大数据并通过一次获取1000个左右的记录连续保存到csv中

问题描述

1 个解决方案

解决方案1 2 2017-05-26 14:18:26

解决方案1
2 2017-05-26 14:18:26