使用 sqlalchemy 从 mysql 获取大量数据的最佳方法是什么？

Question

I want to process over 10 millions data stored in MySQL.我想处理存储在 MySQL 中的超过 1000 万条数据。 So I wrote this to slice the sql to several parts then concatenate the data for latter process.所以我写这个是为了将 sql 分成几个部分，然后将数据连接到后面的过程中。 It works well if count < 2 millions .如果count < 2 millions ，则效果很好。 However when the count rise, the time sqlalchemy consumes goes much longer.但是当count上升时，sqlalchemy 消耗的时间会更长。

def fetch_from_sql(_sql_pat, count):
    """
    :param _sql_pat: SELECT id, data FROM a.b LIMIT {},{};
    :param count: how many data you want to fetch from mysql
    :return: generator
    """
    def gen_connect(sql):
        __engine = create_engine(db_config['SQLALCHEMY_DATABASE_URI'])
        with __engine.connect() as c:
            for row in c.execute(sql)
                yield row

    def gen_range(limit, step):
        if step > limit:
            yield 0, limit
        else:
            R = range(0, limit + 1, step)
            for idx, v in enumerate(R):
                if idx == 0:
                    yield v, step
                elif limit - v >= step:
                    yield v + 1, step
                else:
                    yield v + 1, limit - v

    sqls = [_sql_pat.format(start, step) for start, step in gen_range(count, 100000)]
    sources = (gen_connect(sql) for sql in sqls)
    for s in sources:
        for item in s:
            yield item
        gc.collect()

The question is why the sqlalchemy take more and more time (I logged time and post below), and what is the best way to deal with this situation？问题是为什么 sqlalchemy 花费越来越多的时间（我记录了时间并在下面发布），以及处理这种情况的最佳方法是什么？

Dumped 10000 items, at 2016-10-08 11:55:33
Dumped 1000000 items, at 2016-10-08 11:59:23
Dumped 2000000 items, at 2016-10-08 12:05:07
Dumped 3000000 items, at 2016-10-08 13:54:05

Answer 1

This is because you're using LIMIT / OFFSET , so when you specify offset 3000000, for example, the database has to skip over 3000000 records.这是因为您使用的是LIMIT / OFFSET ，因此当您指定偏移量 3000000 时，例如，数据库必须跳过 3000000 条记录。

The correct way to do this is to ORDER BY some indexed column, like the primary key id column, for example, then do a WHERE id > :last_fetched_id .执行此操作的正确方法是按某些索引列（例如主键id列）进行ORDER BY ，然后执行WHERE id > :last_fetched_id 。

使用 sqlalchemy 从 mysql 获取大量数据的最佳方法是什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-10-09 18:47:11

使用 sqlalchemy 从 mysql 获取大量数据的最佳方法是什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-10-09 18:47:11

解决方案1
1 已采纳 2016-10-09 18:47:11