PostgreSQL查询通过Python花费的时间太长

Question

#!/usr/bin/env python
import pika


def doQuery( conn, i ) :
    cur = conn.cursor()

    cur.execute("SELECT * FROM table OFFSET %s LIMIT 100000", (i,))


    return cur.fetchall()


print "Using psycopg2"
import psycopg2
myConnection = psycopg2.connect( host=hostname, user=username, 
password=password, dbname=database )

connection = 
pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue2')

endloop = False
i = 1
while True:

  results = doQuery( myConnection, i )



    j = 0
    while j < 10000:

      try:
        results[j][-1]
      except:
        endloop = True
      break


      message = str(results[j][-1]).encode("hex")

      channel.basic_publish(exchange='',
                routing_key='task_queue2',
                body=message
                #properties=pika.BasicProperties(
                    #delivery_mode = 2, # make message persistent
                )#)



       j = j + 1

# if i % 10000 == 0:
#   print i

  if endloop == False:
    break        

  i = i + 10000

The SQL query is taking too long to execute when i gets to 100,000,000, but I have about two billion entries I need to put into the queue. 当i达到100,000,000时，SQL查询执行时间太长，但是我需要将大约20 亿个条目放入队列。 Anyone know of a more efficient SQL query that I can run so that I can get all those two billion into the queue faster? 有人知道我可以运行一个更高效的SQL查询，以便更快地将所有这20亿个队列放入队列吗？

Answer 1

psycopg2 supports server-side cursors , that is, a cursor that is managed on the database server rather than in the client. psycopg2支持服务器端游标，即在数据库服务器而非客户端上管理的游标。 The full result set is not transferred all at once to the client, rather it is fed to it as required via the cursor interface. 完整的结果集不会一次全部传输到客户端，而是根据需要通过游标接口将其馈送到客户端。

This will allow you to perform the query without using paging (as LIMIT/OFFSET implements), and will simplify your code. 这将使您无需使用分页即可执行查询（如LIMIT / OFFSET实现），并且将简化代码。 To use a server side cursor use the name parameter when creating the cursor. 要使用服务器端游标，请在创建游标时使用name参数。

import pika
import psycopg2

with psycopg2.connect(host=hostname, user=username, password=password, dbname=database) as conn:
    with conn.cursor(name='my_cursor') as cur:    # create a named server-side cursor
        cur.execute('select * from table')

        connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
        channel.queue_declare(queue='task_queue2')

        for row in cur:
            message = str(row[-1]).encode('hex')
            channel.basic_publish(exchange='', routing_key='task_queue2', body=message)

You might want to tweak cur.itersize to improve performance if necessary. 如有必要，您可能需要调整cur.itersize以提高性能。

PostgreSQL查询通过Python花费的时间太长

问题描述

1 个解决方案

解决方案1
0 2017-09-28 13:01:26

PostgreSQL查询通过Python花费的时间太长

问题描述

1 个解决方案

解决方案1 0 2017-09-28 13:01:26

解决方案1
0 2017-09-28 13:01:26