![](/img/trans.png)
[英]How to extract and save in .csv chunks of data from a large .csv file iteratively using Python?
[英]How to download large data from a SQL table and consecutively save into csv by fetching 1000 or so records at once
我有一个由1000万行和很多列组成的SQL表,查询时表的大小约为44 GB。
但是我试图从此表中仅获取3列并将其保存到csv中/加载到dataframe中,从而使python永远运行。 即
pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data
How to achieve this? Can I load this entire data in dataframe at once is that a viable option. After this I should be able to perform some data manipulations on these rows. 2. OR should I download this to csv and read this data part by part to in-memory.
如果是2,该如何编码2?
到目前为止尝试2的代码是:
def iter_row(cursor, size=10):
while True:
rows = cursor.fetchmany(size)
if not rows:
break
for row in rows:
yield row
def query_with_fetchmany():
cursor.execute("SELECT * FROM books")
for row in iter_row(cursor, 10):
print(row)
cursor.close()
您可以分块读取数据:
for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
c.to_csv(r'/path/to/file.csv', index=False, mode='a')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.