简体   繁体   English

通过Python批量从MySQL中检索数据

[英]Retrieving Data from MySQL in batches via Python

I would like to make this process in batches, because of the volume.由于体积,我想分批进行此过程。

Here's my code:这是我的代码:

 getconn = conexiones()
 con = getconn.mysqlDWconnect()
 with con:
     cur = con.cursor(mdb.cursors.DictCursor)
     cur.execute("SELECT id, date, product_id, sales FROM sales")
     rows = cur.fetchall()

How can I implement an index to fetch the data in batches?如何实现索引以批量获取数据?

First point: a python db-api.cursor is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:第一点:python db-api.cursor是一个迭代器,所以除非你真的需要一次在内存中加载一整批,你可以从使用这个特性开始,即而不是:

cursor.execute("SELECT * FROM mytable")
rows = cursor.fetchall()
for row in rows:
   do_something_with(row)

you could just:你可以:

cursor.execute("SELECT * FROM mytable")
for row in cursor:
   do_something_with(row)

Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:然后,如果您的 db 连接器的实现仍然没有正确使用此功能,则是时候将 LIMIT 和 OFFSET 添加到组合中:

# py2 / py3 compat
try:
    # xrange is defined in py2 only
    xrange
except NameError:
    # py3 range is actually p2 xrange
    xrange = range

cursor.execute("SELECT count(*) FROM mytable")
count = cursor.fetchone()[0]
batch_size = 42 # whatever

for offset in xrange(0, count, batch_size):
    cursor.execute(
        "SELECT * FROM mytable LIMIT %s OFFSET %s", 
        (batch_size, offset))
   for row in cursor:
       do_something_with(row)

You can use您可以使用

SELECT id, date, product_id, sales FROM sales LIMIT X OFFSET Y;

where X is the size of the batch you need and Y is current offset (X times number of current iterations for example)其中 X 是您需要的批次大小,Y 是当前偏移量(例如当前迭代次数的 X 倍)

To expand on akalikin's answer, you can use a stepped iteration to split the query into chunks, and then use LIMIT and OFFSET to execute the query.要扩展 akalikin 的答案,您可以使用分步迭代将查询拆分为多个块,然后使用 LIMIT 和 OFFSET 来执行查询。

cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT COUNT(*) FROM sales")

for i in range(0,cur.fetchall(),5):
    cur2 = con.cursor(mdb.cursors.DictCursor)
    cur2.execute("SELECT id, date, product_id, sales FROM sales LIMIT %s OFFSET %s" %(5,i))
    rows = cur2.fetchall()
    print rows

Thank you, here's how I implement it with your suggestions:谢谢,以下是我根据您的建议实施它的方法:

control = True
index = 0
while control==True:
   getconn = conexiones()
   con = getconn.mysqlDWconnect()
   with con:
        cur = con.cursor(mdb.cursors.DictCursor)
        query = "SELECT id, date, product_id, sales FROM sales  limit 10 OFFSET " + str(10 * (index))
        cur.execute(query)
        rows = cur.fetchall()
        index = index+1        
        if len(rows)== 0:
            control=False
   for row in rows:
        dataset.append(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM