简体   繁体   English

游标如何在Python的DB-API中工作?

[英]How do cursors work in Python's DB-API?

I have been using python with RDBMS' (MySQL and PostgreSQL), and I have noticed that I really do not understand how to use a cursor. 我一直在RDBMS'(MySQL和PostgreSQL)中使用python,并且我注意到我真的不了解如何使用游标。

Usually, one have his script connect to the DB via a client DB-API (like psycopg2 or MySQLdb): 通常,让他的脚本通过客户端DB-API(例如psycopg2或MySQLdb)连接到DB:

connection = psycopg2.connect(host='otherhost', etc)

And then one creates a cursor: 然后创建一个游标:

cursor = connection.cursor()

And then one can issue queries and commands: 然后可以发出查询和命令:

cursor.execute("SELECT * FROM etc")

Now where is the result of the query, I wonder? 我想知道查询的结果在哪里? is it on the server? 在服务器上吗? or a little on my client and a little on my server? 还是我的客户端和服务器上的一点? And then, if we need to access some results, we fetch 'em: 然后,如果需要访问某些结果,则提取它们:

rows = cursor.fetchone() 

or 要么

rows = cursor.fetchmany()

Now lets say, I do not retrieve all the rows, and decide to execute another query, what will happen to the previous results? 现在说,我不检索所有行,而是决定执行另一个查询,那么先前的结果将如何处理? Is their an overhead. 是他们的开销。

Also, should I create a cursor for every form of command and continuously reuse it for those same commands somehow; 另外,我应该为每种命令形式创建一个游标,然后以某种方式连续地将其重复用于那些相同的命令吗? I head psycopg2 can somehow optimize commands that are executed many times but with different values, how and is it worth it? 我头上的psycopg2可以以某种方式优化多次执行但具有不同值的命令,这是值得吗?

Thx 谢谢

ya, i know it's months old :P 是的,我知道已经几个月了:P

DB-API's cursor appears to be closely modeled after SQL cursors. DB-API的游标似乎紧跟SQL游标之后建模。 AFA resource(rows) management is concerned, DB-API does not specify whether the client must retrieve all the rows or DECLARE an actual SQL cursor . 对于AFA资源(行)管理, DB-API没有指定客户端是否必须检索所有行还是将实际的SQL游标声明为DECLARE As long as the fetchXXX interfaces do what they're supposed to, DB-API is happy. 只要fetchXXX接口执行了应有的功能,DB-API就会很高兴。

AFA psycopg2 cursors are concerned(as you may well know), "unnamed DB-API cursors" will fetch the entire result set--AFAIK buffered in memory by libpq. 与AFA psycopg2游标有关(您可能知道),“未命名的DB-API游标”将获取整个结果集-AFAIK由libpq缓冲在内存中。 "named DB-API cursors"(a psycopg2 concept that may not be portable), will request the rows on demand(fetchXXX methods). “命名的DB-API游标”(一个可能无法移植的psycopg2概念)将按需请求行(fetchXXX方法)。

As cited by "unbeknown", executemany can be used to optimize multiple runs of the same command. 如“未知”所引用,executemany可用于优化同一命令的多次运行。 However, it doesn't accommodate for the need of prepared statements; 但是,它不能满足准备好的语句的需要。 when repeat executions of a statement with different parameter sets is not directly sequential, executemany() will perform just as well as execute(). 当具有不同参数集的语句的重复执行不是直接顺序执行时,executemany()的性能将与execute()相同。 DB-API does "provide" driver authors with the ability to cache executed statements, but its implementation(what's the scope/lifetime of the statement?) is undefined, so it's impossible to set expectations across DB-API implementations. DB-API确实“为”驱动程序作者提供了缓存执行的语句的能力,但是其实现(语句的范围/生存期是多少?)是不确定的,因此无法在DB-API实现中设置期望值。

If you are loading lots of data into PostgreSQL, I would strongly recommend trying to find a way to use COPY. 如果要将大量数据加载到PostgreSQL中,强烈建议尝试找到一种使用COPY的方法。

Assuming you're using PostgreSQL, the cursors probably are just implemented using the database's native cursor API. 假设您使用的是PostgreSQL,则游标可能只是使用数据库的本机游标API实现的。 You may want to look at the source code for pg8000 , a pure Python PostgreSQL DB-API module, to see how it handles cursors. 您可能需要查看pg8000的源代码, 是一个纯Python PostgreSQL DB-API模块,以了解其如何处理游标。 You might also like to look at the PostgreSQL documentation for cursors . 您可能还希望查看PostgreSQL文档中的游标

When you look here at the mysqldb documentation you can see that they implemented different strategies for cursors. 当您查看mysqldb文档时,您会发现它们为游标实现了不同的策略。 So the general answer is: it depends. 因此,一般的答案是:取决于情况。

Edit: Here is the mysqldb API documentation . 编辑:这是mysqldb API文档 There is some info how each cursor type is behaving. 有一些信息说明每种游标类型的行为方式。 The standard cursor is storing the result set in the client. 标准游标将结果集存储在客户端中。 So I assume there is a overhead if you don't retrieve all result rows, because even the rows you don't fetch have to be transfered to the client (potentially over the network). 因此,如果您不检索所有结果行,则我认为这会产生开销,因为即使您未提取的行也必须传输到客户端(可能通过网络)。 My guess is that it is not that different from postgresql. 我的猜测是,它与postgresql没什么不同。

When you want to optimize SQL statements that you call repeatedly with many values, you should look at cursor.executemany(). 当您要优化使用许多值重复调用的SQL语句时,应查看cursor.executemany()。 It prepares a SQL statement so that it doesn't need to be parsed every time you call it: 它准备了一条SQL语句,因此不需要在每次调用时都对其进行解析:

cur.executemany('INSERT INTO mytable (col1, col2) VALUES (%s, %s)',
                [('val1', 1), ('val2', 2)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM