[英]What is the best way to fetch huge data from mysql with sqlalchemy?
I want to process over 10 millions data stored in MySQL.我想处理存储在 MySQL 中的超过 1000 万条数据。 So I wrote this to slice the sql to several parts then concatenate the data for latter process.所以我写这个是为了将 sql 分成几个部分,然后将数据连接到后面的过程中。 It works well if count < 2 millions
.如果count < 2 millions
,则效果很好。 However when the count
rise, the time sqlalchemy consumes goes much longer.但是当count
上升时,sqlalchemy 消耗的时间会更长。
def fetch_from_sql(_sql_pat, count):
"""
:param _sql_pat: SELECT id, data FROM a.b LIMIT {},{};
:param count: how many data you want to fetch from mysql
:return: generator
"""
def gen_connect(sql):
__engine = create_engine(db_config['SQLALCHEMY_DATABASE_URI'])
with __engine.connect() as c:
for row in c.execute(sql)
yield row
def gen_range(limit, step):
if step > limit:
yield 0, limit
else:
R = range(0, limit + 1, step)
for idx, v in enumerate(R):
if idx == 0:
yield v, step
elif limit - v >= step:
yield v + 1, step
else:
yield v + 1, limit - v
sqls = [_sql_pat.format(start, step) for start, step in gen_range(count, 100000)]
sources = (gen_connect(sql) for sql in sqls)
for s in sources:
for item in s:
yield item
gc.collect()
The question is why the sqlalchemy take more and more time (I logged time and post below), and what is the best way to deal with this situation?问题是为什么 sqlalchemy 花费越来越多的时间(我记录了时间并在下面发布),以及处理这种情况的最佳方法是什么?
Dumped 10000 items, at 2016-10-08 11:55:33
Dumped 1000000 items, at 2016-10-08 11:59:23
Dumped 2000000 items, at 2016-10-08 12:05:07
Dumped 3000000 items, at 2016-10-08 13:54:05
This is because you're using LIMIT
/ OFFSET
, so when you specify offset 3000000, for example, the database has to skip over 3000000 records.这是因为您使用的是LIMIT
/ OFFSET
,因此当您指定偏移量 3000000 时,例如,数据库必须跳过 3000000 条记录。
The correct way to do this is to ORDER BY
some indexed column, like the primary key id
column, for example, then do a WHERE id > :last_fetched_id
.执行此操作的正确方法是按某些索引列(例如主键id
列)进行ORDER BY
,然后执行WHERE id > :last_fetched_id
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.